Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-21 Thread Michael Armbrust
This vote fails.  Please test RC5.

On Jun 21, 2017 6:50 AM, "Nick Pentreath" <nick.pentre...@gmail.com> wrote:

> Thanks, I added the details of my environment to the JIRA (for what it's
> worth now, as the issue is identified)
>
> On Wed, 14 Jun 2017 at 11:28 Hyukjin Kwon <gurwls...@gmail.com> wrote:
>
>> Actually, I opened - https://issues.apache.org/jira/browse/SPARK-21093.
>>
>> 2017-06-14 17:08 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>:
>>
>>> For a shorter reproducer ...
>>>
>>>
>>> df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
>>> collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>>>
>>> And running the below multiple times (5~7):
>>>
>>> collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>>>
>>> looks occasionally throwing an error.
>>>
>>>
>>> I will leave here and probably explain more information if a JIRA is
>>> open. This does not look a regression anyway.
>>>
>>>
>>>
>>> 2017-06-14 16:22 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>:
>>>
>>>>
>>>> Per https://github.com/apache/spark/tree/v2.1.1,
>>>>
>>>> 1. CentOS 7.2.1511 / R 3.3.3 - this test hangs.
>>>>
>>>> I messed it up a bit while downgrading the R to 3.3.3 (It was an actual
>>>> machine not a VM) so it took me a while to re-try this.
>>>> I re-built this again and checked the R version is 3.3.3 at least. I
>>>> hope this one could double checked.
>>>>
>>>> Here is the self-reproducer:
>>>>
>>>> irisDF <- suppressWarnings(createDataFrame (iris))
>>>> schema <-  structType(structField("Sepal_Length", "double"),
>>>> structField("Avg", "double"))
>>>> df4 <- gapply(
>>>>   cols = "Sepal_Length",
>>>>   irisDF,
>>>>   function(key, x) {
>>>> y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE)
>>>>   },
>>>>   schema)
>>>> collect(df4)
>>>>
>>>>
>>>>
>>>> 2017-06-14 16:07 GMT+09:00 Felix Cheung <felixcheun...@hotmail.com>:
>>>>
>>>>> Thanks! Will try to setup RHEL/CentOS to test it out
>>>>>
>>>>> _
>>>>> From: Nick Pentreath <nick.pentre...@gmail.com>
>>>>> Sent: Tuesday, June 13, 2017 11:38 PM
>>>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>>>>> To: Felix Cheung <felixcheun...@hotmail.com>, Hyukjin Kwon <
>>>>> gurwls...@gmail.com>, dev <dev@spark.apache.org>
>>>>>
>>>>> Cc: Sean Owen <so...@cloudera.com>
>>>>>
>>>>>
>>>>> Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have
>>>>> to report back later with the versions as am AFK.
>>>>>
>>>>> R version not totally sure but again will revert asap
>>>>> On Wed, 14 Jun 2017 at 05:09, Felix Cheung <felixcheun...@hotmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks
>>>>>> This was with an external package and unrelated
>>>>>>
>>>>>>   >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>>>>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>>>>>>
>>>>>> As for CentOS - would it be possible to test against R older than
>>>>>> 3.4.0? This is the same error reported by Nick below.
>>>>>>
>>>>>> _
>>>>>> From: Hyukjin Kwon <gurwls...@gmail.com>
>>>>>> Sent: Tuesday, June 13, 2017 8:02 PM
>>>>>>
>>>>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>>>>>> To: dev <dev@spark.apache.org>
>>>>>> Cc: Sean Owen <so...@cloudera.com>, Nick Pentreath <
>>>>>> nick.pentre...@gmail.com>, Felix Cheung <felixcheun...@hotmail.com>
>>>>>>
>>>>>>
>>>>>>
>>>>>> For the test failure on R, I checked:
>>>>>>
>>>>>>
>>>>>> Per https://github.com/apache/spa

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-21 Thread Nick Pentreath
Thanks, I added the details of my environment to the JIRA (for what it's
worth now, as the issue is identified)

On Wed, 14 Jun 2017 at 11:28 Hyukjin Kwon <gurwls...@gmail.com> wrote:

> Actually, I opened - https://issues.apache.org/jira/browse/SPARK-21093.
>
> 2017-06-14 17:08 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>:
>
>> For a shorter reproducer ...
>>
>>
>> df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
>> collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>>
>> And running the below multiple times (5~7):
>>
>> collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>>
>> looks occasionally throwing an error.
>>
>>
>> I will leave here and probably explain more information if a JIRA is
>> open. This does not look a regression anyway.
>>
>>
>>
>> 2017-06-14 16:22 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>:
>>
>>>
>>> Per https://github.com/apache/spark/tree/v2.1.1,
>>>
>>> 1. CentOS 7.2.1511 / R 3.3.3 - this test hangs.
>>>
>>> I messed it up a bit while downgrading the R to 3.3.3 (It was an actual
>>> machine not a VM) so it took me a while to re-try this.
>>> I re-built this again and checked the R version is 3.3.3 at least. I
>>> hope this one could double checked.
>>>
>>> Here is the self-reproducer:
>>>
>>> irisDF <- suppressWarnings(createDataFrame (iris))
>>> schema <-  structType(structField("Sepal_Length", "double"),
>>> structField("Avg", "double"))
>>> df4 <- gapply(
>>>   cols = "Sepal_Length",
>>>   irisDF,
>>>   function(key, x) {
>>> y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE)
>>>   },
>>>   schema)
>>> collect(df4)
>>>
>>>
>>>
>>> 2017-06-14 16:07 GMT+09:00 Felix Cheung <felixcheun...@hotmail.com>:
>>>
>>>> Thanks! Will try to setup RHEL/CentOS to test it out
>>>>
>>>> _
>>>> From: Nick Pentreath <nick.pentre...@gmail.com>
>>>> Sent: Tuesday, June 13, 2017 11:38 PM
>>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>>>> To: Felix Cheung <felixcheun...@hotmail.com>, Hyukjin Kwon <
>>>> gurwls...@gmail.com>, dev <dev@spark.apache.org>
>>>>
>>>> Cc: Sean Owen <so...@cloudera.com>
>>>>
>>>>
>>>> Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have
>>>> to report back later with the versions as am AFK.
>>>>
>>>> R version not totally sure but again will revert asap
>>>> On Wed, 14 Jun 2017 at 05:09, Felix Cheung <felixcheun...@hotmail.com>
>>>> wrote:
>>>>
>>>>> Thanks
>>>>> This was with an external package and unrelated
>>>>>
>>>>>   >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>>>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>>>>>
>>>>> As for CentOS - would it be possible to test against R older than
>>>>> 3.4.0? This is the same error reported by Nick below.
>>>>>
>>>>> _
>>>>> From: Hyukjin Kwon <gurwls...@gmail.com>
>>>>> Sent: Tuesday, June 13, 2017 8:02 PM
>>>>>
>>>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>>>>> To: dev <dev@spark.apache.org>
>>>>> Cc: Sean Owen <so...@cloudera.com>, Nick Pentreath <
>>>>> nick.pentre...@gmail.com>, Felix Cheung <felixcheun...@hotmail.com>
>>>>>
>>>>>
>>>>>
>>>>> For the test failure on R, I checked:
>>>>>
>>>>>
>>>>> Per https://github.com/apache/spark/tree/v2.2.0-rc4,
>>>>>
>>>>> 1. Windows Server 2012 R2 / R 3.3.1 - passed (
>>>>> https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4
>>>>> )
>>>>> 2. macOS Sierra 10.12.3 / R 3.4.0 - passed
>>>>> 3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>>>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>>>>> 4. CentOS 7.2.1511 / R 3.4.0 - reproduced (

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-20 Thread Xiao Li
Found another bug about the case preserving of column names of persistent
views. This regression was introduced in 2.2.
https://issues.apache.org/jira/browse/SPARK-21150

Thanks,

Xiao

2017-06-19 8:03 GMT-07:00 Liang-Chi Hsieh :

>
> I mean it is not a bug has been fixed before this feature added. Of course
> kryo serializer with 2000+ partitions are working before this feature.
>
>
> Koert Kuipers wrote
> > If a feature added recently breaks using kryo serializer with 2000+
> > partitions then how can it not be a regression? I mean I use kryo with
> > more
> > than 2000 partitions all the time, and it worked before. Or was I simply
> > not hitting this bug because there are other conditions that also need to
> > be satisfied besides kryo and 2000+ partitions?
> >
> > On Jun 19, 2017 2:20 AM, "Liang-Chi Hsieh" 
>
> > viirya@
>
> >  wrote:
> >
> >
> > I think it's not. This is a feature added recently.
> >
> >
> > Hyukjin Kwon wrote
> >> Is this a regression BTW? I am just curious.
> >>
> >> On 19 Jun 2017 1:18 pm, "Liang-Chi Hsieh" 
> >
> >> viirya@
> >
> >>  wrote:
> >>
> >> -1. When using kyro serializer and partition number is greater than
> 2000.
> >> There seems a NPE issue needed to fix.
> >>
> >> SPARK-21133 https://issues.apache.org/jira/browse/SPARK-21133;
> >>
> >>
> >>
> >>
> >> -
> >> Liang-Chi Hsieh | @viirya
> >> Spark Technology Center
> >> http://www.spark.tc/
> >> --
> >> View this message in context: http://apache-spark-
> >> developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-
> >> 2-2-0-RC4-tp21677p21784.html
> >> Sent from the Apache Spark Developers List mailing list archive at
> >> Nabble.com.
> >>
> >> -
> >> To unsubscribe e-mail:
> >
> >> dev-unsubscribe@.apache
> >
> >
> >
> >
> >
> > -
> > Liang-Chi Hsieh | @viirya
> > Spark Technology Center
> > http://www.spark.tc/
> > --
> > View this message in context: http://apache-spark-
> > developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-
> > 2-2-0-RC4-tp21677p21786.html
> > Sent from the Apache Spark Developers List mailing list archive at
> > Nabble.com.
> >
> > -
> > To unsubscribe e-mail:
>
> > dev-unsubscribe@.apache
>
>
>
>
>
> -
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-
> 2-2-0-RC4-tp21677p21790.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-19 Thread Liang-Chi Hsieh

I mean it is not a bug has been fixed before this feature added. Of course
kryo serializer with 2000+ partitions are working before this feature.


Koert Kuipers wrote
> If a feature added recently breaks using kryo serializer with 2000+
> partitions then how can it not be a regression? I mean I use kryo with
> more
> than 2000 partitions all the time, and it worked before. Or was I simply
> not hitting this bug because there are other conditions that also need to
> be satisfied besides kryo and 2000+ partitions?
> 
> On Jun 19, 2017 2:20 AM, "Liang-Chi Hsieh" 

> viirya@

>  wrote:
> 
> 
> I think it's not. This is a feature added recently.
> 
> 
> Hyukjin Kwon wrote
>> Is this a regression BTW? I am just curious.
>>
>> On 19 Jun 2017 1:18 pm, "Liang-Chi Hsieh" 
> 
>> viirya@
> 
>>  wrote:
>>
>> -1. When using kyro serializer and partition number is greater than 2000.
>> There seems a NPE issue needed to fix.
>>
>> SPARK-21133 https://issues.apache.org/jira/browse/SPARK-21133;
>>
>>
>>
>>
>> -
>> Liang-Chi Hsieh | @viirya
>> Spark Technology Center
>> http://www.spark.tc/
>> --
>> View this message in context: http://apache-spark-
>> developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-
>> 2-2-0-RC4-tp21677p21784.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> -
>> To unsubscribe e-mail:
> 
>> dev-unsubscribe@.apache
> 
> 
> 
> 
> 
> -
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-
> 2-2-0-RC4-tp21677p21786.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
> 
> -
> To unsubscribe e-mail: 

> dev-unsubscribe@.apache





-
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-2-2-0-RC4-tp21677p21790.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-19 Thread Koert Kuipers
If a feature added recently breaks using kryo serializer with 2000+
partitions then how can it not be a regression? I mean I use kryo with more
than 2000 partitions all the time, and it worked before. Or was I simply
not hitting this bug because there are other conditions that also need to
be satisfied besides kryo and 2000+ partitions?

On Jun 19, 2017 2:20 AM, "Liang-Chi Hsieh"  wrote:


I think it's not. This is a feature added recently.


Hyukjin Kwon wrote
> Is this a regression BTW? I am just curious.
>
> On 19 Jun 2017 1:18 pm, "Liang-Chi Hsieh" 

> viirya@

>  wrote:
>
> -1. When using kyro serializer and partition number is greater than 2000.
> There seems a NPE issue needed to fix.
>
> SPARK-21133 https://issues.apache.org/jira/browse/SPARK-21133;
>
>
>
>
> -
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-
> 2-2-0-RC4-tp21677p21784.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe e-mail:

> dev-unsubscribe@.apache





-
Liang-Chi Hsieh | @viirya
Spark Technology Center
http://www.spark.tc/
--
View this message in context: http://apache-spark-
developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-
2-2-0-RC4-tp21677p21786.html
Sent from the Apache Spark Developers List mailing list archive at
Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-19 Thread Liang-Chi Hsieh

I think it's not. This is a feature added recently.


Hyukjin Kwon wrote
> Is this a regression BTW? I am just curious.
> 
> On 19 Jun 2017 1:18 pm, "Liang-Chi Hsieh" 

> viirya@

>  wrote:
> 
> -1. When using kyro serializer and partition number is greater than 2000.
> There seems a NPE issue needed to fix.
> 
> SPARK-21133 https://issues.apache.org/jira/browse/SPARK-21133;
> 
> 
> 
> 
> -
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-
> 2-2-0-RC4-tp21677p21784.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
> 
> -
> To unsubscribe e-mail: 

> dev-unsubscribe@.apache





-
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-2-2-0-RC4-tp21677p21786.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-18 Thread Hyukjin Kwon
Is this a regression BTW? I am just curious.

On 19 Jun 2017 1:18 pm, "Liang-Chi Hsieh"  wrote:

-1. When using kyro serializer and partition number is greater than 2000.
There seems a NPE issue needed to fix.

SPARK-21133 




-
Liang-Chi Hsieh | @viirya
Spark Technology Center
http://www.spark.tc/
--
View this message in context: http://apache-spark-
developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-
2-2-0-RC4-tp21677p21784.html
Sent from the Apache Spark Developers List mailing list archive at
Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-18 Thread Liang-Chi Hsieh
-1. When using kyro serializer and partition number is greater than 2000.
There seems a NPE issue needed to fix.

SPARK-21133   




-
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-2-2-0-RC4-tp21677p21784.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-15 Thread Felix Cheung
Sounds good.

Think we checked and should be good to go. Appreciated.


From: Michael Armbrust <mich...@databricks.com>
Sent: Wednesday, June 14, 2017 4:51:48 PM
To: Hyukjin Kwon
Cc: Felix Cheung; Nick Pentreath; dev; Sean Owen
Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)

So, it looks like 
SPARK-21085<https://issues.apache.org/jira/browse/SPARK-21085> has been fixed 
and SPARK-21093<https://issues.apache.org/jira/browse/SPARK-21093> is not a 
regression.  Last call before I cut RC5.

On Wed, Jun 14, 2017 at 2:28 AM, Hyukjin Kwon 
<gurwls...@gmail.com<mailto:gurwls...@gmail.com>> wrote:
Actually, I opened - https://issues.apache.org/jira/browse/SPARK-21093.

2017-06-14 17:08 GMT+09:00 Hyukjin Kwon 
<gurwls...@gmail.com<mailto:gurwls...@gmail.com>>:
For a shorter reproducer ...


df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
collect(gapply(df, "a", function(key, x) { x }, schema(df)))

And running the below multiple times (5~7):

collect(gapply(df, "a", function(key, x) { x }, schema(df)))

looks occasionally throwing an error.


I will leave here and probably explain more information if a JIRA is open. This 
does not look a regression anyway.



2017-06-14 16:22 GMT+09:00 Hyukjin Kwon 
<gurwls...@gmail.com<mailto:gurwls...@gmail.com>>:

Per https://github.com/apache/spark/tree/v2.1.1,

1. CentOS 7.2.1511 / R 3.3.3 - this test hangs.

I messed it up a bit while downgrading the R to 3.3.3 (It was an actual machine 
not a VM) so it took me a while to re-try this.
I re-built this again and checked the R version is 3.3.3 at least. I hope this 
one could double checked.

Here is the self-reproducer:

irisDF <- suppressWarnings(createDataFrame (iris))
schema <-  structType(structField("Sepal_Length", "double"), structField("Avg", 
"double"))
df4 <- gapply(
  cols = "Sepal_Length",
  irisDF,
  function(key, x) {
y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE)
  },
  schema)
collect(df4)



2017-06-14 16:07 GMT+09:00 Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>:
Thanks! Will try to setup RHEL/CentOS to test it out

_
From: Nick Pentreath <nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>>
Sent: Tuesday, June 13, 2017 11:38 PM
Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>, 
Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>>, dev 
<dev@spark.apache.org<mailto:dev@spark.apache.org>>

Cc: Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>>


Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have to 
report back later with the versions as am AFK.

R version not totally sure but again will revert asap
On Wed, 14 Jun 2017 at 05:09, Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote:
Thanks
This was with an external package and unrelated

  >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning 
(https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)

As for CentOS - would it be possible to test against R older than 3.4.0? This 
is the same error reported by Nick below.

_
From: Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>>
Sent: Tuesday, June 13, 2017 8:02 PM

Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
To: dev <dev@spark.apache.org<mailto:dev@spark.apache.org>>
Cc: Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>>, Nick Pentreath 
<nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>>, Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>



For the test failure on R, I checked:


Per https://github.com/apache/spark/tree/v2.2.0-rc4,

1. Windows Server 2012 R2 / R 3.3.1 - passed 
(https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4)
2. macOS Sierra 10.12.3 / R 3.4.0 - passed
3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning 
(https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
4. CentOS 7.2.1511 / R 3.4.0 - reproduced 
(https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d)


Per https://github.com/apache/spark/tree/v2.1.1,

1. CentOS 7.2.1511 / R 3.4.0 - reproduced 
(https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301)


This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my tests and 
observations.

This is failed in Spark 2.1.1. So, it sounds not a regression although it is a 
bug that should be fixed (whether in Spark or R).


2017-06-14 8:28 GMT+09:00 Xiao Li 
<gatorsm...@gmail.com<mailto:gatorsm.

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Michael Armbrust
So, it looks like SPARK-21085
<https://issues.apache.org/jira/browse/SPARK-21085> has been fixed and
SPARK-21093 <https://issues.apache.org/jira/browse/SPARK-21093> is not a
regression.  Last call before I cut RC5.

On Wed, Jun 14, 2017 at 2:28 AM, Hyukjin Kwon <gurwls...@gmail.com> wrote:

> Actually, I opened - https://issues.apache.org/jira/browse/SPARK-21093.
>
> 2017-06-14 17:08 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>:
>
>> For a shorter reproducer ...
>>
>>
>> df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
>> collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>>
>> And running the below multiple times (5~7):
>>
>> collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>>
>> looks occasionally throwing an error.
>>
>>
>> I will leave here and probably explain more information if a JIRA is
>> open. This does not look a regression anyway.
>>
>>
>>
>> 2017-06-14 16:22 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>:
>>
>>>
>>> Per https://github.com/apache/spark/tree/v2.1.1,
>>>
>>> 1. CentOS 7.2.1511 / R 3.3.3 - this test hangs.
>>>
>>> I messed it up a bit while downgrading the R to 3.3.3 (It was an actual
>>> machine not a VM) so it took me a while to re-try this.
>>> I re-built this again and checked the R version is 3.3.3 at least. I
>>> hope this one could double checked.
>>>
>>> Here is the self-reproducer:
>>>
>>> irisDF <- suppressWarnings(createDataFrame (iris))
>>> schema <-  structType(structField("Sepal_Length", "double"),
>>> structField("Avg", "double"))
>>> df4 <- gapply(
>>>   cols = "Sepal_Length",
>>>   irisDF,
>>>   function(key, x) {
>>> y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE)
>>>   },
>>>   schema)
>>> collect(df4)
>>>
>>>
>>>
>>> 2017-06-14 16:07 GMT+09:00 Felix Cheung <felixcheun...@hotmail.com>:
>>>
>>>> Thanks! Will try to setup RHEL/CentOS to test it out
>>>>
>>>> _
>>>> From: Nick Pentreath <nick.pentre...@gmail.com>
>>>> Sent: Tuesday, June 13, 2017 11:38 PM
>>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>>>> To: Felix Cheung <felixcheun...@hotmail.com>, Hyukjin Kwon <
>>>> gurwls...@gmail.com>, dev <dev@spark.apache.org>
>>>>
>>>> Cc: Sean Owen <so...@cloudera.com>
>>>>
>>>>
>>>> Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have
>>>> to report back later with the versions as am AFK.
>>>>
>>>> R version not totally sure but again will revert asap
>>>> On Wed, 14 Jun 2017 at 05:09, Felix Cheung <felixcheun...@hotmail.com>
>>>> wrote:
>>>>
>>>>> Thanks
>>>>> This was with an external package and unrelated
>>>>>
>>>>>   >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>>>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>>>>>
>>>>> As for CentOS - would it be possible to test against R older than
>>>>> 3.4.0? This is the same error reported by Nick below.
>>>>>
>>>>> _
>>>>> From: Hyukjin Kwon <gurwls...@gmail.com>
>>>>> Sent: Tuesday, June 13, 2017 8:02 PM
>>>>>
>>>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>>>>> To: dev <dev@spark.apache.org>
>>>>> Cc: Sean Owen <so...@cloudera.com>, Nick Pentreath <
>>>>> nick.pentre...@gmail.com>, Felix Cheung <felixcheun...@hotmail.com>
>>>>>
>>>>>
>>>>>
>>>>> For the test failure on R, I checked:
>>>>>
>>>>>
>>>>> Per https://github.com/apache/spark/tree/v2.2.0-rc4,
>>>>>
>>>>> 1. Windows Server 2012 R2 / R 3.3.1 - passed (
>>>>> https://ci.appveyor.com/project/spark-test/spark/build/755-
>>>>> r-test-v2.2.0-rc4)
>>>>> 2. macOS Sierra 10.12.3 / R 3.4.0 - passed
>>>>> 3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>>>>> https://gist.gith

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Hyukjin Kwon
Actually, I opened - https://issues.apache.org/jira/browse/SPARK-21093.

2017-06-14 17:08 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>:

> For a shorter reproducer ...
>
>
> df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
> collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>
> And running the below multiple times (5~7):
>
> collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>
> looks occasionally throwing an error.
>
>
> I will leave here and probably explain more information if a JIRA is open.
> This does not look a regression anyway.
>
>
>
> 2017-06-14 16:22 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>:
>
>>
>> Per https://github.com/apache/spark/tree/v2.1.1,
>>
>> 1. CentOS 7.2.1511 / R 3.3.3 - this test hangs.
>>
>> I messed it up a bit while downgrading the R to 3.3.3 (It was an actual
>> machine not a VM) so it took me a while to re-try this.
>> I re-built this again and checked the R version is 3.3.3 at least. I hope
>> this one could double checked.
>>
>> Here is the self-reproducer:
>>
>> irisDF <- suppressWarnings(createDataFrame (iris))
>> schema <-  structType(structField("Sepal_Length", "double"),
>> structField("Avg", "double"))
>> df4 <- gapply(
>>   cols = "Sepal_Length",
>>   irisDF,
>>   function(key, x) {
>> y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE)
>>   },
>>   schema)
>> collect(df4)
>>
>>
>>
>> 2017-06-14 16:07 GMT+09:00 Felix Cheung <felixcheun...@hotmail.com>:
>>
>>> Thanks! Will try to setup RHEL/CentOS to test it out
>>>
>>> _
>>> From: Nick Pentreath <nick.pentre...@gmail.com>
>>> Sent: Tuesday, June 13, 2017 11:38 PM
>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>>> To: Felix Cheung <felixcheun...@hotmail.com>, Hyukjin Kwon <
>>> gurwls...@gmail.com>, dev <dev@spark.apache.org>
>>>
>>> Cc: Sean Owen <so...@cloudera.com>
>>>
>>>
>>> Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have
>>> to report back later with the versions as am AFK.
>>>
>>> R version not totally sure but again will revert asap
>>> On Wed, 14 Jun 2017 at 05:09, Felix Cheung <felixcheun...@hotmail.com>
>>> wrote:
>>>
>>>> Thanks
>>>> This was with an external package and unrelated
>>>>
>>>>   >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>>>>
>>>> As for CentOS - would it be possible to test against R older than
>>>> 3.4.0? This is the same error reported by Nick below.
>>>>
>>>> _
>>>> From: Hyukjin Kwon <gurwls...@gmail.com>
>>>> Sent: Tuesday, June 13, 2017 8:02 PM
>>>>
>>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>>>> To: dev <dev@spark.apache.org>
>>>> Cc: Sean Owen <so...@cloudera.com>, Nick Pentreath <
>>>> nick.pentre...@gmail.com>, Felix Cheung <felixcheun...@hotmail.com>
>>>>
>>>>
>>>>
>>>> For the test failure on R, I checked:
>>>>
>>>>
>>>> Per https://github.com/apache/spark/tree/v2.2.0-rc4,
>>>>
>>>> 1. Windows Server 2012 R2 / R 3.3.1 - passed (
>>>> https://ci.appveyor.com/project/spark-test/spark/build/755-
>>>> r-test-v2.2.0-rc4)
>>>> 2. macOS Sierra 10.12.3 / R 3.4.0 - passed
>>>> 3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>>>> 4. CentOS 7.2.1511 / R 3.4.0 - reproduced (
>>>> https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d)
>>>>
>>>>
>>>> Per https://github.com/apache/spark/tree/v2.1.1,
>>>>
>>>> 1. CentOS 7.2.1511 / R 3.4.0 - reproduced (
>>>> https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301)
>>>>
>>>>
>>>> This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my
>>>> tests and observations.
>>>>
>>>> This is failed in Spark 2.1.1. So, it sounds not a regres

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Hyukjin Kwon
For a shorter reproducer ...


df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
collect(gapply(df, "a", function(key, x) { x }, schema(df)))

And running the below multiple times (5~7):

collect(gapply(df, "a", function(key, x) { x }, schema(df)))

looks occasionally throwing an error.


I will leave here and probably explain more information if a JIRA is open.
This does not look a regression anyway.



2017-06-14 16:22 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>:

>
> Per https://github.com/apache/spark/tree/v2.1.1,
>
> 1. CentOS 7.2.1511 / R 3.3.3 - this test hangs.
>
> I messed it up a bit while downgrading the R to 3.3.3 (It was an actual
> machine not a VM) so it took me a while to re-try this.
> I re-built this again and checked the R version is 3.3.3 at least. I hope
> this one could double checked.
>
> Here is the self-reproducer:
>
> irisDF <- suppressWarnings(createDataFrame (iris))
> schema <-  structType(structField("Sepal_Length", "double"),
> structField("Avg", "double"))
> df4 <- gapply(
>   cols = "Sepal_Length",
>   irisDF,
>   function(key, x) {
> y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE)
>   },
>   schema)
> collect(df4)
>
>
>
> 2017-06-14 16:07 GMT+09:00 Felix Cheung <felixcheun...@hotmail.com>:
>
>> Thanks! Will try to setup RHEL/CentOS to test it out
>>
>> _
>> From: Nick Pentreath <nick.pentre...@gmail.com>
>> Sent: Tuesday, June 13, 2017 11:38 PM
>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>> To: Felix Cheung <felixcheun...@hotmail.com>, Hyukjin Kwon <
>> gurwls...@gmail.com>, dev <dev@spark.apache.org>
>>
>> Cc: Sean Owen <so...@cloudera.com>
>>
>>
>> Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have to
>> report back later with the versions as am AFK.
>>
>> R version not totally sure but again will revert asap
>> On Wed, 14 Jun 2017 at 05:09, Felix Cheung <felixcheun...@hotmail.com>
>> wrote:
>>
>>> Thanks
>>> This was with an external package and unrelated
>>>
>>>   >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>>>
>>> As for CentOS - would it be possible to test against R older than 3.4.0?
>>> This is the same error reported by Nick below.
>>>
>>> _
>>> From: Hyukjin Kwon <gurwls...@gmail.com>
>>> Sent: Tuesday, June 13, 2017 8:02 PM
>>>
>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>>> To: dev <dev@spark.apache.org>
>>> Cc: Sean Owen <so...@cloudera.com>, Nick Pentreath <
>>> nick.pentre...@gmail.com>, Felix Cheung <felixcheun...@hotmail.com>
>>>
>>>
>>>
>>> For the test failure on R, I checked:
>>>
>>>
>>> Per https://github.com/apache/spark/tree/v2.2.0-rc4,
>>>
>>> 1. Windows Server 2012 R2 / R 3.3.1 - passed (
>>> https://ci.appveyor.com/project/spark-test/spark/build/755-
>>> r-test-v2.2.0-rc4)
>>> 2. macOS Sierra 10.12.3 / R 3.4.0 - passed
>>> 3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>>> 4. CentOS 7.2.1511 / R 3.4.0 - reproduced (https://gist.github.com/Hyukj
>>> inKwon/2a736b9f80318618cc147ac2bb1a987d)
>>>
>>>
>>> Per https://github.com/apache/spark/tree/v2.1.1,
>>>
>>> 1. CentOS 7.2.1511 / R 3.4.0 - reproduced (https://gist.github.com/Hyukj
>>> inKwon/6064b0d10bab8fc1dc6212452d83b301)
>>>
>>>
>>> This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my tests
>>> and observations.
>>>
>>> This is failed in Spark 2.1.1. So, it sounds not a regression although
>>> it is a bug that should be fixed (whether in Spark or R).
>>>
>>>
>>> 2017-06-14 8:28 GMT+09:00 Xiao Li <gatorsm...@gmail.com>:
>>>
>>>> -1
>>>>
>>>> Spark 2.2 is unable to read the partitioned table created by Spark 2.1
>>>> or earlier.
>>>>
>>>> Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085
>>>>
>>>> Will fix it soon.
>>>>
>>>> Thanks,
>>>>
>>>> Xiao Li
>>>&g

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Hyukjin Kwon
Per https://github.com/apache/spark/tree/v2.1.1,

1. CentOS 7.2.1511 / R 3.3.3 - this test hangs.

I messed it up a bit while downgrading the R to 3.3.3 (It was an actual
machine not a VM) so it took me a while to re-try this.
I re-built this again and checked the R version is 3.3.3 at least. I hope
this one could double checked.

Here is the self-reproducer:

irisDF <- suppressWarnings(createDataFrame (iris))
schema <-  structType(structField("Sepal_Length", "double"),
structField("Avg", "double"))
df4 <- gapply(
  cols = "Sepal_Length",
  irisDF,
  function(key, x) {
y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE)
  },
  schema)
collect(df4)



2017-06-14 16:07 GMT+09:00 Felix Cheung <felixcheun...@hotmail.com>:

> Thanks! Will try to setup RHEL/CentOS to test it out
>
> _
> From: Nick Pentreath <nick.pentre...@gmail.com>
> Sent: Tuesday, June 13, 2017 11:38 PM
> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
> To: Felix Cheung <felixcheun...@hotmail.com>, Hyukjin Kwon <
> gurwls...@gmail.com>, dev <dev@spark.apache.org>
>
> Cc: Sean Owen <so...@cloudera.com>
>
>
> Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have to
> report back later with the versions as am AFK.
>
> R version not totally sure but again will revert asap
> On Wed, 14 Jun 2017 at 05:09, Felix Cheung <felixcheun...@hotmail.com>
> wrote:
>
>> Thanks
>> This was with an external package and unrelated
>>
>>   >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>>
>> As for CentOS - would it be possible to test against R older than 3.4.0?
>> This is the same error reported by Nick below.
>>
>> _
>> From: Hyukjin Kwon <gurwls...@gmail.com>
>> Sent: Tuesday, June 13, 2017 8:02 PM
>>
>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>> To: dev <dev@spark.apache.org>
>> Cc: Sean Owen <so...@cloudera.com>, Nick Pentreath <
>> nick.pentre...@gmail.com>, Felix Cheung <felixcheun...@hotmail.com>
>>
>>
>>
>> For the test failure on R, I checked:
>>
>>
>> Per https://github.com/apache/spark/tree/v2.2.0-rc4,
>>
>> 1. Windows Server 2012 R2 / R 3.3.1 - passed (https://ci.appveyor.com/
>> project/spark-test/spark/build/755-r-test-v2.2.0-rc4)
>> 2. macOS Sierra 10.12.3 / R 3.4.0 - passed
>> 3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>> 4. CentOS 7.2.1511 / R 3.4.0 - reproduced (https://gist.github.com/
>> HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d)
>>
>>
>> Per https://github.com/apache/spark/tree/v2.1.1,
>>
>> 1. CentOS 7.2.1511 / R 3.4.0 - reproduced (https://gist.github.com/
>> HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301)
>>
>>
>> This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my tests
>> and observations.
>>
>> This is failed in Spark 2.1.1. So, it sounds not a regression although it
>> is a bug that should be fixed (whether in Spark or R).
>>
>>
>> 2017-06-14 8:28 GMT+09:00 Xiao Li <gatorsm...@gmail.com>:
>>
>>> -1
>>>
>>> Spark 2.2 is unable to read the partitioned table created by Spark 2.1
>>> or earlier.
>>>
>>> Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085
>>>
>>> Will fix it soon.
>>>
>>> Thanks,
>>>
>>> Xiao Li
>>>
>>>
>>>
>>> 2017-06-13 9:39 GMT-07:00 Joseph Bradley <jos...@databricks.com>:
>>>
>>>> Re: the QA JIRAs:
>>>> Thanks for discussing them.  I still feel they are very helpful; I
>>>> particularly notice not having to spend a solid 2-3 weeks of time QAing
>>>> (unlike in earlier Spark releases).  One other point not mentioned above: I
>>>> think they serve as a very helpful reminder/training for the community for
>>>> rigor in development.  Since we instituted QA JIRAs, contributors have been
>>>> a lot better about adding in docs early, rather than waiting until the end
>>>> of the cycle (though I know this is drawing conclusions from correlations).
>>>>
>>>> I would vote in favor of the RC...but I'll wait to see about the
>>>> reported failures.
>>>>
>>>> On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen <so...@cloudera.com> wrote:
&

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Felix Cheung
Thanks! Will try to setup RHEL/CentOS to test it out

_
From: Nick Pentreath <nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>>
Sent: Tuesday, June 13, 2017 11:38 PM
Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>, 
Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>>, dev 
<dev@spark.apache.org<mailto:dev@spark.apache.org>>
Cc: Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>>


Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have to 
report back later with the versions as am AFK.

R version not totally sure but again will revert asap
On Wed, 14 Jun 2017 at 05:09, Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote:
Thanks
This was with an external package and unrelated

  >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning 
(https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)

As for CentOS - would it be possible to test against R older than 3.4.0? This 
is the same error reported by Nick below.

_
From: Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>>
Sent: Tuesday, June 13, 2017 8:02 PM

Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
To: dev <dev@spark.apache.org<mailto:dev@spark.apache.org>>
Cc: Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>>, Nick Pentreath 
<nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>>, Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>



For the test failure on R, I checked:


Per https://github.com/apache/spark/tree/v2.2.0-rc4,

1. Windows Server 2012 R2 / R 3.3.1 - passed 
(https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4)
2. macOS Sierra 10.12.3 / R 3.4.0 - passed
3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning 
(https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
4. CentOS 7.2.1511 / R 3.4.0 - reproduced 
(https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d)


Per https://github.com/apache/spark/tree/v2.1.1,

1. CentOS 7.2.1511 / R 3.4.0 - reproduced 
(https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301)


This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my tests and 
observations.

This is failed in Spark 2.1.1. So, it sounds not a regression although it is a 
bug that should be fixed (whether in Spark or R).


2017-06-14 8:28 GMT+09:00 Xiao Li 
<gatorsm...@gmail.com<mailto:gatorsm...@gmail.com>>:
-1

Spark 2.2 is unable to read the partitioned table created by Spark 2.1 or 
earlier.

Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085

Will fix it soon.

Thanks,

Xiao Li



2017-06-13 9:39 GMT-07:00 Joseph Bradley 
<jos...@databricks.com<mailto:jos...@databricks.com>>:
Re: the QA JIRAs:
Thanks for discussing them.  I still feel they are very helpful; I particularly 
notice not having to spend a solid 2-3 weeks of time QAing (unlike in earlier 
Spark releases).  One other point not mentioned above: I think they serve as a 
very helpful reminder/training for the community for rigor in development.  
Since we instituted QA JIRAs, contributors have been a lot better about adding 
in docs early, rather than waiting until the end of the cycle (though I know 
this is drawing conclusions from correlations).

I would vote in favor of the RC...but I'll wait to see about the reported 
failures.

On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen 
<so...@cloudera.com<mailto:so...@cloudera.com>> wrote:
Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but 
that's also reporting R test failures.

I went back and tried to run the R tests and they passed, at least on Ubuntu 17 
/ R 3.3.


On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath 
<nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>> wrote:
All Scala, Python tests pass. ML QA and doc issues are resolved (as well as R 
it seems).

However, I'm seeing the following test failure on R consistently: 
https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72


On Thu, 8 Jun 2017 at 08:48 Denny Lee 
<denny.g@gmail.com<mailto:denny.g@gmail.com>> wrote:
+1 non-binding

Tested on macOS Sierra, Ubuntu 16.04
test suite includes various test cases including Spark SQL, ML, GraphFrames, 
Structured Streaming


On Wed, Jun 7, 2017 at 9:40 PM vaquar khan 
<vaquar.k...@gmail.com<mailto:vaquar.k...@gmail.com>> wrote:
+1 non-binding

Regards,
vaquar khan

On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
<ricardo.alme...@actnowib.com<mailto:ricardo.alme...@actnowib.com>> wrote:
+1 (non-binding)

Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive 
-Phive-thriftserver -Pscala-

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Nick Pentreath
Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have to
report back later with the versions as am AFK.

R version not totally sure but again will revert asap
On Wed, 14 Jun 2017 at 05:09, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> Thanks
> This was with an external package and unrelated
>
>   >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>
> As for CentOS - would it be possible to test against R older than 3.4.0?
> This is the same error reported by Nick below.
>
> _
> From: Hyukjin Kwon <gurwls...@gmail.com>
> Sent: Tuesday, June 13, 2017 8:02 PM
>
> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
> To: dev <dev@spark.apache.org>
> Cc: Sean Owen <so...@cloudera.com>, Nick Pentreath <
> nick.pentre...@gmail.com>, Felix Cheung <felixcheun...@hotmail.com>
>
>
>
> For the test failure on R, I checked:
>
>
> Per https://github.com/apache/spark/tree/v2.2.0-rc4,
>
> 1. Windows Server 2012 R2 / R 3.3.1 - passed (
> https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4
> )
> 2. macOS Sierra 10.12.3 / R 3.4.0 - passed
> 3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
> 4. CentOS 7.2.1511 / R 3.4.0 - reproduced (
> https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d)
>
>
> Per https://github.com/apache/spark/tree/v2.1.1,
>
> 1. CentOS 7.2.1511 / R 3.4.0 - reproduced (
> https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301)
>
>
> This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my tests
> and observations.
>
> This is failed in Spark 2.1.1. So, it sounds not a regression although it
> is a bug that should be fixed (whether in Spark or R).
>
>
> 2017-06-14 8:28 GMT+09:00 Xiao Li <gatorsm...@gmail.com>:
>
>> -1
>>
>> Spark 2.2 is unable to read the partitioned table created by Spark 2.1 or
>> earlier.
>>
>> Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085
>>
>> Will fix it soon.
>>
>> Thanks,
>>
>> Xiao Li
>>
>>
>>
>> 2017-06-13 9:39 GMT-07:00 Joseph Bradley <jos...@databricks.com>:
>>
>>> Re: the QA JIRAs:
>>> Thanks for discussing them.  I still feel they are very helpful; I
>>> particularly notice not having to spend a solid 2-3 weeks of time QAing
>>> (unlike in earlier Spark releases).  One other point not mentioned above: I
>>> think they serve as a very helpful reminder/training for the community for
>>> rigor in development.  Since we instituted QA JIRAs, contributors have been
>>> a lot better about adding in docs early, rather than waiting until the end
>>> of the cycle (though I know this is drawing conclusions from correlations).
>>>
>>> I would vote in favor of the RC...but I'll wait to see about the
>>> reported failures.
>>>
>>> On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>>> Different errors as in
>>>> https://issues.apache.org/jira/browse/SPARK-20520 but that's also
>>>> reporting R test failures.
>>>>
>>>> I went back and tried to run the R tests and they passed, at least on
>>>> Ubuntu 17 / R 3.3.
>>>>
>>>>
>>>> On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath <nick.pentre...@gmail.com>
>>>> wrote:
>>>>
>>>>> All Scala, Python tests pass. ML QA and doc issues are resolved (as
>>>>> well as R it seems).
>>>>>
>>>>> However, I'm seeing the following test failure on R consistently:
>>>>> https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72
>>>>>
>>>>>
>>>>> On Thu, 8 Jun 2017 at 08:48 Denny Lee <denny.g@gmail.com> wrote:
>>>>>
>>>>>> +1 non-binding
>>>>>>
>>>>>> Tested on macOS Sierra, Ubuntu 16.04
>>>>>> test suite includes various test cases including Spark SQL, ML,
>>>>>> GraphFrames, Structured Streaming
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 7, 2017 at 9:40 PM vaquar khan <vaquar.k...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> +1 non-binding
>>>>>>>
>>>>>>> Regards,
>>>>>>> vaquar khan
&g

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Felix Cheung
Thanks
This was with an external package and unrelated

  >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning 
(https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)

As for CentOS - would it be possible to test against R older than 3.4.0? This 
is the same error reported by Nick below.

_
From: Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>>
Sent: Tuesday, June 13, 2017 8:02 PM
Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
To: dev <dev@spark.apache.org<mailto:dev@spark.apache.org>>
Cc: Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>>, Nick Pentreath 
<nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>>, Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>


For the test failure on R, I checked:


Per https://github.com/apache/spark/tree/v2.2.0-rc4,

1. Windows Server 2012 R2 / R 3.3.1 - passed 
(https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4)
2. macOS Sierra 10.12.3 / R 3.4.0 - passed
3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning 
(https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
4. CentOS 7.2.1511 / R 3.4.0 - reproduced 
(https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d)


Per https://github.com/apache/spark/tree/v2.1.1,

1. CentOS 7.2.1511 / R 3.4.0 - reproduced 
(https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301)


This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my tests and 
observations.

This is failed in Spark 2.1.1. So, it sounds not a regression although it is a 
bug that should be fixed (whether in Spark or R).


2017-06-14 8:28 GMT+09:00 Xiao Li 
<gatorsm...@gmail.com<mailto:gatorsm...@gmail.com>>:
-1

Spark 2.2 is unable to read the partitioned table created by Spark 2.1 or 
earlier.

Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085

Will fix it soon.

Thanks,

Xiao Li



2017-06-13 9:39 GMT-07:00 Joseph Bradley 
<jos...@databricks.com<mailto:jos...@databricks.com>>:
Re: the QA JIRAs:
Thanks for discussing them.  I still feel they are very helpful; I particularly 
notice not having to spend a solid 2-3 weeks of time QAing (unlike in earlier 
Spark releases).  One other point not mentioned above: I think they serve as a 
very helpful reminder/training for the community for rigor in development.  
Since we instituted QA JIRAs, contributors have been a lot better about adding 
in docs early, rather than waiting until the end of the cycle (though I know 
this is drawing conclusions from correlations).

I would vote in favor of the RC...but I'll wait to see about the reported 
failures.

On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen 
<so...@cloudera.com<mailto:so...@cloudera.com>> wrote:
Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but 
that's also reporting R test failures.

I went back and tried to run the R tests and they passed, at least on Ubuntu 17 
/ R 3.3.


On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath 
<nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>> wrote:
All Scala, Python tests pass. ML QA and doc issues are resolved (as well as R 
it seems).

However, I'm seeing the following test failure on R consistently: 
https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72


On Thu, 8 Jun 2017 at 08:48 Denny Lee 
<denny.g@gmail.com<mailto:denny.g@gmail.com>> wrote:
+1 non-binding

Tested on macOS Sierra, Ubuntu 16.04
test suite includes various test cases including Spark SQL, ML, GraphFrames, 
Structured Streaming


On Wed, Jun 7, 2017 at 9:40 PM vaquar khan 
<vaquar.k...@gmail.com<mailto:vaquar.k...@gmail.com>> wrote:
+1 non-binding

Regards,
vaquar khan

On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
<ricardo.alme...@actnowib.com<mailto:ricardo.alme...@actnowib.com>> wrote:
+1 (non-binding)

Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive 
-Phive-thriftserver -Pscala-2.11 on

  *   Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
  *   macOS 10.12.5 Java 8 (build 1.8.0_131)

On 5 June 2017 at 21:14, Michael Armbrust 
<mich...@databricks.com<mailto:mich...@databricks.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. 
The vote is open until Thurs, June 8th, 2017 at 12:00 PST and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is 
v2.2.0-rc4<https://github.com/apache/spark/tree/v2.2.0-rc4> 
(377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)

List of JIRA tickets resolved can be found with this 
filter<https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AN

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Hyukjin Kwon
For the test failure on R, I checked:


Per https://github.com/apache/spark/tree/v2.2.0-rc4,

1. Windows Server 2012 R2 / R 3.3.1 - passed (
https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4
)
2. macOS Sierra 10.12.3 / R 3.4.0 - passed
3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
4. CentOS 7.2.1511 / R 3.4.0 - reproduced (
https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d)


Per https://github.com/apache/spark/tree/v2.1.1,

1. CentOS 7.2.1511 / R 3.4.0 - reproduced (
https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301)


This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my tests
and observations.

This is failed in Spark 2.1.1. So, it sounds not a regression although it
is a bug that should be fixed (whether in Spark or R).


2017-06-14 8:28 GMT+09:00 Xiao Li :

> -1
>
> Spark 2.2 is unable to read the partitioned table created by Spark 2.1 or
> earlier.
>
> Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085
>
> Will fix it soon.
>
> Thanks,
>
> Xiao Li
>
>
>
> 2017-06-13 9:39 GMT-07:00 Joseph Bradley :
>
>> Re: the QA JIRAs:
>> Thanks for discussing them.  I still feel they are very helpful; I
>> particularly notice not having to spend a solid 2-3 weeks of time QAing
>> (unlike in earlier Spark releases).  One other point not mentioned above: I
>> think they serve as a very helpful reminder/training for the community for
>> rigor in development.  Since we instituted QA JIRAs, contributors have been
>> a lot better about adding in docs early, rather than waiting until the end
>> of the cycle (though I know this is drawing conclusions from correlations).
>>
>> I would vote in favor of the RC...but I'll wait to see about the reported
>> failures.
>>
>> On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen  wrote:
>>
>>> Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but
>>> that's also reporting R test failures.
>>>
>>> I went back and tried to run the R tests and they passed, at least on
>>> Ubuntu 17 / R 3.3.
>>>
>>>
>>> On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath 
>>> wrote:
>>>
 All Scala, Python tests pass. ML QA and doc issues are resolved (as
 well as R it seems).

 However, I'm seeing the following test failure on R consistently:
 https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72


 On Thu, 8 Jun 2017 at 08:48 Denny Lee  wrote:

> +1 non-binding
>
> Tested on macOS Sierra, Ubuntu 16.04
> test suite includes various test cases including Spark SQL, ML,
> GraphFrames, Structured Streaming
>
>
> On Wed, Jun 7, 2017 at 9:40 PM vaquar khan 
> wrote:
>
>> +1 non-binding
>>
>> Regards,
>> vaquar khan
>>
>> On Jun 7, 2017 4:32 PM, "Ricardo Almeida" <
>> ricardo.alme...@actnowib.com> wrote:
>>
>> +1 (non-binding)
>>
>> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn
>> -Phive -Phive-thriftserver -Pscala-2.11 on
>>
>>- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
>>- macOS 10.12.5 Java 8 (build 1.8.0_131)
>>
>>
>> On 5 June 2017 at 21:14, Michael Armbrust 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.2.0. The vote is open until Thurs, June 8th, 2017 at
>>> 12:00 PST and passes if a majority of at least 3 +1 PMC votes are
>>> cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see
>>> http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.2.0-rc4
>>>  (377cfa8ac7ff7a8
>>> a6a6d273182e18ea7dc25ce7e)
>>>
>>> List of JIRA tickets resolved can be found with this filter
>>> 
>>> .
>>>
>>> The release files, including signatures, digests, etc. can be found
>>> at:
>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapache
>>> spark-1241/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.
>>> 0-rc4-docs/
>>>
>>>
>>> *FAQ*
>>>
>>> *How can I help test this 

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Xiao Li
-1

Spark 2.2 is unable to read the partitioned table created by Spark 2.1 or
earlier.

Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085

Will fix it soon.

Thanks,

Xiao Li



2017-06-13 9:39 GMT-07:00 Joseph Bradley :

> Re: the QA JIRAs:
> Thanks for discussing them.  I still feel they are very helpful; I
> particularly notice not having to spend a solid 2-3 weeks of time QAing
> (unlike in earlier Spark releases).  One other point not mentioned above: I
> think they serve as a very helpful reminder/training for the community for
> rigor in development.  Since we instituted QA JIRAs, contributors have been
> a lot better about adding in docs early, rather than waiting until the end
> of the cycle (though I know this is drawing conclusions from correlations).
>
> I would vote in favor of the RC...but I'll wait to see about the reported
> failures.
>
> On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen  wrote:
>
>> Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but
>> that's also reporting R test failures.
>>
>> I went back and tried to run the R tests and they passed, at least on
>> Ubuntu 17 / R 3.3.
>>
>>
>> On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath 
>> wrote:
>>
>>> All Scala, Python tests pass. ML QA and doc issues are resolved (as well
>>> as R it seems).
>>>
>>> However, I'm seeing the following test failure on R consistently:
>>> https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72
>>>
>>>
>>> On Thu, 8 Jun 2017 at 08:48 Denny Lee  wrote:
>>>
 +1 non-binding

 Tested on macOS Sierra, Ubuntu 16.04
 test suite includes various test cases including Spark SQL, ML,
 GraphFrames, Structured Streaming


 On Wed, Jun 7, 2017 at 9:40 PM vaquar khan 
 wrote:

> +1 non-binding
>
> Regards,
> vaquar khan
>
> On Jun 7, 2017 4:32 PM, "Ricardo Almeida" <
> ricardo.alme...@actnowib.com> wrote:
>
> +1 (non-binding)
>
> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn
> -Phive -Phive-thriftserver -Pscala-2.11 on
>
>- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
>- macOS 10.12.5 Java 8 (build 1.8.0_131)
>
>
> On 5 June 2017 at 21:14, Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark
>> version 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00
>> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.2.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.2.0-rc4
>>  (377cfa8ac7ff7a8
>> a6a6d273182e18ea7dc25ce7e)
>>
>> List of JIRA tickets resolved can be found with this filter
>> 
>> .
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapache
>> spark-1241/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.
>> 0-rc4-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from 2.1.1.
>>
>
>
>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] 
>


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Joseph Bradley
Re: the QA JIRAs:
Thanks for discussing them.  I still feel they are very helpful; I
particularly notice not having to spend a solid 2-3 weeks of time QAing
(unlike in earlier Spark releases).  One other point not mentioned above: I
think they serve as a very helpful reminder/training for the community for
rigor in development.  Since we instituted QA JIRAs, contributors have been
a lot better about adding in docs early, rather than waiting until the end
of the cycle (though I know this is drawing conclusions from correlations).

I would vote in favor of the RC...but I'll wait to see about the reported
failures.

On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen  wrote:

> Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but
> that's also reporting R test failures.
>
> I went back and tried to run the R tests and they passed, at least on
> Ubuntu 17 / R 3.3.
>
>
> On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath 
> wrote:
>
>> All Scala, Python tests pass. ML QA and doc issues are resolved (as well
>> as R it seems).
>>
>> However, I'm seeing the following test failure on R consistently:
>> https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72
>>
>>
>> On Thu, 8 Jun 2017 at 08:48 Denny Lee  wrote:
>>
>>> +1 non-binding
>>>
>>> Tested on macOS Sierra, Ubuntu 16.04
>>> test suite includes various test cases including Spark SQL, ML,
>>> GraphFrames, Structured Streaming
>>>
>>>
>>> On Wed, Jun 7, 2017 at 9:40 PM vaquar khan 
>>> wrote:
>>>
 +1 non-binding

 Regards,
 vaquar khan

 On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
 wrote:

 +1 (non-binding)

 Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn
 -Phive -Phive-thriftserver -Pscala-2.11 on

- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
- macOS 10.12.5 Java 8 (build 1.8.0_131)


 On 5 June 2017 at 21:14, Michael Armbrust 
 wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00
> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.2.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.2.0-rc4
>  (377cfa8ac7ff7a8
> a6a6d273182e18ea7dc25ce7e)
>
> List of JIRA tickets resolved can be found with this filter
> 
> .
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/
> orgapachespark-1241/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-
> 2.2.0-rc4-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.2.0?*
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should be
> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from 2.1.1.
>





-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] 


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-09 Thread Sean Owen
Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but
that's also reporting R test failures.

I went back and tried to run the R tests and they passed, at least on
Ubuntu 17 / R 3.3.

On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath 
wrote:

> All Scala, Python tests pass. ML QA and doc issues are resolved (as well
> as R it seems).
>
> However, I'm seeing the following test failure on R consistently:
> https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72
>
>
> On Thu, 8 Jun 2017 at 08:48 Denny Lee  wrote:
>
>> +1 non-binding
>>
>> Tested on macOS Sierra, Ubuntu 16.04
>> test suite includes various test cases including Spark SQL, ML,
>> GraphFrames, Structured Streaming
>>
>>
>> On Wed, Jun 7, 2017 at 9:40 PM vaquar khan  wrote:
>>
>>> +1 non-binding
>>>
>>> Regards,
>>> vaquar khan
>>>
>>> On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
>>> wrote:
>>>
>>> +1 (non-binding)
>>>
>>> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive
>>> -Phive-thriftserver -Pscala-2.11 on
>>>
>>>- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
>>>- macOS 10.12.5 Java 8 (build 1.8.0_131)
>>>
>>>
>>> On 5 June 2017 at 21:14, Michael Armbrust 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00
 PST and passes if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 2.2.0
 [ ] -1 Do not release this package because ...


 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v2.2.0-rc4
  (
 377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)

 List of JIRA tickets resolved can be found with this filter
 
 .

 The release files, including signatures, digests, etc. can be found at:
 http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1241/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/


 *FAQ*

 *How can I help test this release?*

 If you are a Spark user, you can help us test this release by taking an
 existing Spark workload and running on this release candidate, then
 reporting any regressions.

 *What should happen to JIRA tickets still targeting 2.2.0?*

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should be
 worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

 *But my bug isn't fixed!??!*

 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from 2.1.1.

>>>
>>>
>>>


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-09 Thread Dong Joon Hyun
Hi, Nick.

Could you give us more information on your environment like R/JDK/OS?

Bests,
Dongjoon.

From: Nick Pentreath <nick.pentre...@gmail.com>
Date: Friday, June 9, 2017 at 1:12 AM
To: dev <dev@spark.apache.org>
Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)

All Scala, Python tests pass. ML QA and doc issues are resolved (as well as R 
it seems).

However, I'm seeing the following test failure on R consistently: 
https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72


On Thu, 8 Jun 2017 at 08:48 Denny Lee 
<denny.g@gmail.com<mailto:denny.g@gmail.com>> wrote:
+1 non-binding

Tested on macOS Sierra, Ubuntu 16.04
test suite includes various test cases including Spark SQL, ML, GraphFrames, 
Structured Streaming


On Wed, Jun 7, 2017 at 9:40 PM vaquar khan 
<vaquar.k...@gmail.com<mailto:vaquar.k...@gmail.com>> wrote:
+1 non-binding

Regards,
vaquar khan

On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
<ricardo.alme...@actnowib.com<mailto:ricardo.alme...@actnowib.com>> wrote:
+1 (non-binding)

Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive 
-Phive-thriftserver -Pscala-2.11 on

  *   Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
  *   macOS 10.12.5 Java 8 (build 1.8.0_131)

On 5 June 2017 at 21:14, Michael Armbrust 
<mich...@databricks.com<mailto:mich...@databricks.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. 
The vote is open until Thurs, June 8th, 2017 at 12:00 PST and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is 
v2.2.0-rc4<https://github.com/apache/spark/tree/v2.2.0-rc4> 
(377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)

List of JIRA tickets resolved can be found with this 
filter<https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0>.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1241/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked on 
immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless 
the bug in question is a regression from 2.1.1.




Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-09 Thread Felix Cheung
Hmm, that's odd. This test would be in Jenkins too - let me double check

_
From: Nick Pentreath <nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>>
Sent: Friday, June 9, 2017 1:12 AM
Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
To: dev <dev@spark.apache.org<mailto:dev@spark.apache.org>>


All Scala, Python tests pass. ML QA and doc issues are resolved (as well as R 
it seems).

However, I'm seeing the following test failure on R consistently: 
https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72


On Thu, 8 Jun 2017 at 08:48 Denny Lee 
<denny.g@gmail.com<mailto:denny.g@gmail.com>> wrote:
+1 non-binding

Tested on macOS Sierra, Ubuntu 16.04
test suite includes various test cases including Spark SQL, ML, GraphFrames, 
Structured Streaming


On Wed, Jun 7, 2017 at 9:40 PM vaquar khan 
<vaquar.k...@gmail.com<mailto:vaquar.k...@gmail.com>> wrote:
+1 non-binding

Regards,
vaquar khan

On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
<ricardo.alme...@actnowib.com<mailto:ricardo.alme...@actnowib.com>> wrote:
+1 (non-binding)

Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive 
-Phive-thriftserver -Pscala-2.11 on

  *   Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
  *   macOS 10.12.5 Java 8 (build 1.8.0_131)

On 5 June 2017 at 21:14, Michael Armbrust 
<mich...@databricks.com<mailto:mich...@databricks.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. 
The vote is open until Thurs, June 8th, 2017 at 12:00 PST and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is 
v2.2.0-rc4<https://github.com/apache/spark/tree/v2.2.0-rc4> 
(377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)

List of JIRA tickets resolved can be found with this 
filter<https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0>.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1241/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked on 
immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless 
the bug in question is a regression from 2.1.1.






Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-09 Thread Nick Pentreath
All Scala, Python tests pass. ML QA and doc issues are resolved (as well as
R it seems).

However, I'm seeing the following test failure on R consistently:
https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72


On Thu, 8 Jun 2017 at 08:48 Denny Lee  wrote:

> +1 non-binding
>
> Tested on macOS Sierra, Ubuntu 16.04
> test suite includes various test cases including Spark SQL, ML,
> GraphFrames, Structured Streaming
>
>
> On Wed, Jun 7, 2017 at 9:40 PM vaquar khan  wrote:
>
>> +1 non-binding
>>
>> Regards,
>> vaquar khan
>>
>> On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
>> wrote:
>>
>> +1 (non-binding)
>>
>> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive
>> -Phive-thriftserver -Pscala-2.11 on
>>
>>- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
>>- macOS 10.12.5 Java 8 (build 1.8.0_131)
>>
>>
>> On 5 June 2017 at 21:14, Michael Armbrust  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00
>>> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.2.0-rc4
>>>  (
>>> 377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)
>>>
>>> List of JIRA tickets resolved can be found with this filter
>>> 
>>> .
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1241/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/
>>>
>>>
>>> *FAQ*
>>>
>>> *How can I help test this release?*
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should be
>>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>>
>>> *But my bug isn't fixed!??!*
>>>
>>> In order to make timely releases, we will typically not hold the release
>>> unless the bug in question is a regression from 2.1.1.
>>>
>>
>>
>>


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-08 Thread Sean Owen
+1 from me. Felix et al indicated that the various "2.2" JIRAs had no
further actions. I retargeted most of the other 2.2.0-targeted JIRAs that
didn't seem like they're must-do. We have no Blockers and I'm not aware of
any changes that must be in the 2.2 release that aren't.

These are the only remaining 2.2 issues, FYI:

SPARK-20520 R streaming tests failed on Windows
SPARK-15799 Release SparkR on CRAN
SPARK-18267 Distribute PySpark via Python Package Index (pypi)

On Tue, Jun 6, 2017 at 12:20 AM Sean Owen  wrote:

> On the latest Ubuntu, Java 8, with -Phive -Phadoop-2.7 -Pyarn, this passes
> all tests. It's looking good, pending a double-check on the outstanding
> JIRA questions.
>
> All the hashes and sigs are correct.
>
> On Mon, Jun 5, 2017 at 8:15 PM Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.2.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.2.0-rc4
>>  (
>> 377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)
>>
>> List of JIRA tickets resolved can be found with this filter
>> 
>> .
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1241/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.1.
>>
>


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-08 Thread Denny Lee
+1 non-binding

Tested on macOS Sierra, Ubuntu 16.04
test suite includes various test cases including Spark SQL, ML,
GraphFrames, Structured Streaming


On Wed, Jun 7, 2017 at 9:40 PM vaquar khan  wrote:

> +1 non-binding
>
> Regards,
> vaquar khan
>
> On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
> wrote:
>
> +1 (non-binding)
>
> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive
> -Phive-thriftserver -Pscala-2.11 on
>
>- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
>- macOS 10.12.5 Java 8 (build 1.8.0_131)
>
>
> On 5 June 2017 at 21:14, Michael Armbrust  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.2.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.2.0-rc4
>>  (
>> 377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)
>>
>> List of JIRA tickets resolved can be found with this filter
>> 
>> .
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1241/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.1.
>>
>
>
>


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-07 Thread vaquar khan
+1 non-binding

Regards,
vaquar khan

On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
wrote:

+1 (non-binding)

Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive
-Phive-thriftserver -Pscala-2.11 on

   - Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
   - macOS 10.12.5 Java 8 (build 1.8.0_131)


On 5 June 2017 at 21:14, Michael Armbrust  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.2.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.2.0-rc4
>  (377cfa8ac7ff7a8
> a6a6d273182e18ea7dc25ce7e)
>
> List of JIRA tickets resolved can be found with this filter
> 
> .
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1241/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.2.0?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.1.
>


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-07 Thread Ricardo Almeida
+1 (non-binding)

Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive
-Phive-thriftserver -Pscala-2.11 on

   - Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
   - macOS 10.12.5 Java 8 (build 1.8.0_131)


On 5 June 2017 at 21:14, Michael Armbrust  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.2.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.2.0-rc4
>  (377cfa8ac7ff7a8
> a6a6d273182e18ea7dc25ce7e)
>
> List of JIRA tickets resolved can be found with this filter
> 
> .
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1241/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.2.0?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.1.
>


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Dong Joon Hyun
+1 (non-binding)

I built and tested on CentOS 7.3.1611 / OpenJDK 1.8.131 / R 3.3.3
with “-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver –Psparkr”.
Java/Scala/R tests passed as expected.

There are two minor things.


  1.  For the deprecation documentation issue 
(https://github.com/apache/spark/pull/18207),
I hope it goes to `Release Note` instead of blocking the current voting.

Something like `http://spark.apache.org/releases/spark-release-2-1-0.html`.


  1.  3rd Party test suite may fail due to the following difference
Previously, until Spark 2.1.1, the count was ‘1’.
It is https://issues.apache.org/jira/browse/SPARK-20954 .

scala> sql("create table t(a int)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("desc table t").show
+--+-+---+
|  col_name|data_type|comment|
+--+-+---+
|# col_name|data_type|comment|
| a|  int|   null|
+--+-+---+

scala> sql("desc table t").count
res2: Long = 2

Bests,
Dongjoon.




From: Michael Armbrust <mich...@databricks.com>
Date: Monday, June 5, 2017 at 12:14 PM
To: "dev@spark.apache.org" <dev@spark.apache.org>
Subject: [VOTE] Apache Spark 2.2.0 (RC4)

Please vote on releasing the following candidate as Apache Spark version 2.2.0. 
The vote is open until Thurs, June 8th, 2017 at 12:00 PST and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is 
v2.2.0-rc4<https://github.com/apache/spark/tree/v2.2.0-rc4> 
(377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)

List of JIRA tickets resolved can be found with this 
filter<https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0>.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1241/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked on 
immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless 
the bug in question is a regression from 2.1.1.


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Holden Karau
+1 pip install to local virtual env works, no local version string (was
blocking the pypi upload).


On Tue, Jun 6, 2017 at 8:03 AM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> All tasks on the R QA umbrella are completed
> SPARK-20512
>
> We can close this.
>
>
>
> _
> From: Sean Owen <so...@cloudera.com>
> Sent: Tuesday, June 6, 2017 1:16 AM
> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
> To: Michael Armbrust <mich...@databricks.com>
> Cc: <dev@spark.apache.org>
>
>
>
> On Tue, Jun 6, 2017 at 1:06 AM Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> Regarding the readiness of this and previous RCs.  I did cut RC1 & RC2
>> knowing that they were unlikely to pass.  That said, I still think these
>> early RCs are valuable. I know several users that wanted to test new
>> features in 2.2 that have used them.  Now, if we would prefer to call them
>> preview or RC0 or something I'd be okay with that as well.
>>
>
> They are valuable, I only suggest it's better to note explicitly when
> there are blockers or must-do tasks that will fail the RC. It makes a big
> difference to whether one would like to +1.
>
> I meant more than just calling them something different. An early RC could
> be voted as a released 'preview' artifact, at the start of the notional QA
> period, with a lower bar to passing, and releasable with known issues. This
> encourages more testing. It also resolves the controversy about whether
> it's OK to include an RC in a product (separate thread).
>
>
> Regarding doc updates, I don't think it is a requirement that they be
>> voted on as part of the release.  Even if they are something version
>> specific.  I think we have regularly updated the website with documentation
>> that was merged after the release.
>>
>
> They're part of the source release too, as markdown, and should be voted
> on. I've never understood otherwise. Have we actually released docs and
> then later changed them, so that they don't match the release? I don't
> recall that, but I do recall updating the non-version-specific website.
>
> Aside from the oddity of having docs generated from x.y source not match
> docs published for x.y, you want the same protections for doc source that
> the project distributes as anything else. It's not just correctness, but
> liability. The hypothetical is always that someone included copyrighted
> text or something without permission and now the project can't rely on the
> argument that it made a good-faith effort to review what it released on the
> site. Someone becomes personally liable.
>
> These are pretty technical reasons though. More practically, what's the
> hurry to release if docs aren't done (_if_ they're not done)? It's being
> presented as normal practice, but seems quite exceptional.
>
>
>
>> I personally don't think the QA umbrella JIRAs are particularly
>> effective, but I also wouldn't ban their use if others think they are.
>> However, I do think that real QA needs an RC to test, so I think it is fine
>> that there is still outstanding QA to be done when an RC is cut.  For
>> example, I plan to run a bunch of streaming workloads on RC4 and will vote
>> accordingly.
>>
>
> QA on RCs is great (see above). The problem is, I can't distinguish
> between a JIRA that means "we must test in general", which sounds like
> something you too would ignore, and one that means "there is specific
> functionality we have to check before a release that we haven't looked at
> yet", which is a committer waving a flag that they implicitly do not want a
> release until resolved. I wouldn't +1 a release that had a Blocker software
> defect one of us reported.
>
> I know I'm harping on this, but this is the one mechanism we do use
> consistently (Blocker JIRAs) to clearly communicate about issues vital to a
> go / no-go release decision, and I think this interferes. The rest of JIRA
> noise doesn't matter much. You can see we're already resorting to secondary
> communications as a result ("anyone have any issues that need to be fixed
> before I cut another RC?" emails) because this is kind of ignored, and
> think we're swapping out a decent mechanism for worse one.
>
> I suspect, as you do, that there's no to-do here in which case they should
> be resolved and we're still on track for release. I'd wait on +1 until then.
>
>
>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Felix Cheung
All tasks on the R QA umbrella are completed
SPARK-20512

We can close this.



_
From: Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>>
Sent: Tuesday, June 6, 2017 1:16 AM
Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
To: Michael Armbrust <mich...@databricks.com<mailto:mich...@databricks.com>>
Cc: <dev@spark.apache.org<mailto:dev@spark.apache.org>>


On Tue, Jun 6, 2017 at 1:06 AM Michael Armbrust 
<mich...@databricks.com<mailto:mich...@databricks.com>> wrote:
Regarding the readiness of this and previous RCs.  I did cut RC1 & RC2 knowing 
that they were unlikely to pass.  That said, I still think these early RCs are 
valuable. I know several users that wanted to test new features in 2.2 that 
have used them.  Now, if we would prefer to call them preview or RC0 or 
something I'd be okay with that as well.

They are valuable, I only suggest it's better to note explicitly when there are 
blockers or must-do tasks that will fail the RC. It makes a big difference to 
whether one would like to +1.

I meant more than just calling them something different. An early RC could be 
voted as a released 'preview' artifact, at the start of the notional QA period, 
with a lower bar to passing, and releasable with known issues. This encourages 
more testing. It also resolves the controversy about whether it's OK to include 
an RC in a product (separate thread).


Regarding doc updates, I don't think it is a requirement that they be voted on 
as part of the release.  Even if they are something version specific.  I think 
we have regularly updated the website with documentation that was merged after 
the release.

They're part of the source release too, as markdown, and should be voted on. 
I've never understood otherwise. Have we actually released docs and then later 
changed them, so that they don't match the release? I don't recall that, but I 
do recall updating the non-version-specific website.

Aside from the oddity of having docs generated from x.y source not match docs 
published for x.y, you want the same protections for doc source that the 
project distributes as anything else. It's not just correctness, but liability. 
The hypothetical is always that someone included copyrighted text or something 
without permission and now the project can't rely on the argument that it made 
a good-faith effort to review what it released on the site. Someone becomes 
personally liable.

These are pretty technical reasons though. More practically, what's the hurry 
to release if docs aren't done (_if_ they're not done)? It's being presented as 
normal practice, but seems quite exceptional.


I personally don't think the QA umbrella JIRAs are particularly effective, but 
I also wouldn't ban their use if others think they are.  However, I do think 
that real QA needs an RC to test, so I think it is fine that there is still 
outstanding QA to be done when an RC is cut.  For example, I plan to run a 
bunch of streaming workloads on RC4 and will vote accordingly.

QA on RCs is great (see above). The problem is, I can't distinguish between a 
JIRA that means "we must test in general", which sounds like something you too 
would ignore, and one that means "there is specific functionality we have to 
check before a release that we haven't looked at yet", which is a committer 
waving a flag that they implicitly do not want a release until resolved. I 
wouldn't +1 a release that had a Blocker software defect one of us reported.

I know I'm harping on this, but this is the one mechanism we do use 
consistently (Blocker JIRAs) to clearly communicate about issues vital to a go 
/ no-go release decision, and I think this interferes. The rest of JIRA noise 
doesn't matter much. You can see we're already resorting to secondary 
communications as a result ("anyone have any issues that need to be fixed 
before I cut another RC?" emails) because this is kind of ignored, and think 
we're swapping out a decent mechanism for worse one.

I suspect, as you do, that there's no to-do here in which case they should be 
resolved and we're still on track for release. I'd wait on +1 until then.





Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Nick Pentreath
Now, on the subject of (ML) QA JIRAs.

>From the ML side, I believe they are required (I think others such as
Joseph will agree and in fact have already said as much).

Most are marked as Blockers, though of those the Python API coverage is
strictly not a Blocker as we will never hold the release for API parity
issues (unless of course there is some critical bug or missing thing, but
that really falls under the standard RC bug triage process).

I believe they are Blockers, since they involve auditing binary compat and
new public APIs, visibility issues, Java compat etc. I think it's obvious
that a RC should not pass if these have not been checked.

I actually agree that docs and user guide are absolutely part of the
release, and in fact are one of the more important pieces of the release.
Apart from the issues Sean mentions, not treating these things are critical
issues or even blockers is what inevitably over time leads to the user
guide being out of date, missing important features, etc.

In practice for ML at least we definitely aim to have all the doc / guide
issues done before the final release.

Now in terms of process, none of these QA issues really require an RC, they
can all be carried out once the release branch is cut. Some of the issues
like binary compat are perhaps a bit more tricky but inevitably involves
manually checking through MiMa exclusions added, to verify they are ok, etc
- so again an actual RC is not required here.

So really the answer is to more aggressively burn down these QA issues the
moment the release branch has been cut. Again, I think this echoes what
Joseph has said in previous threads.



On Tue, 6 Jun 2017 at 10:16 Sean Owen  wrote:

> On Tue, Jun 6, 2017 at 1:06 AM Michael Armbrust 
> wrote:
>
>> Regarding the readiness of this and previous RCs.  I did cut RC1 & RC2
>> knowing that they were unlikely to pass.  That said, I still think these
>> early RCs are valuable. I know several users that wanted to test new
>> features in 2.2 that have used them.  Now, if we would prefer to call them
>> preview or RC0 or something I'd be okay with that as well.
>>
>
> They are valuable, I only suggest it's better to note explicitly when
> there are blockers or must-do tasks that will fail the RC. It makes a big
> difference to whether one would like to +1.
>
> I meant more than just calling them something different. An early RC could
> be voted as a released 'preview' artifact, at the start of the notional QA
> period, with a lower bar to passing, and releasable with known issues. This
> encourages more testing. It also resolves the controversy about whether
> it's OK to include an RC in a product (separate thread).
>
>
> Regarding doc updates, I don't think it is a requirement that they be
>> voted on as part of the release.  Even if they are something version
>> specific.  I think we have regularly updated the website with documentation
>> that was merged after the release.
>>
>
> They're part of the source release too, as markdown, and should be voted
> on. I've never understood otherwise. Have we actually released docs and
> then later changed them, so that they don't match the release? I don't
> recall that, but I do recall updating the non-version-specific website.
>
> Aside from the oddity of having docs generated from x.y source not match
> docs published for x.y, you want the same protections for doc source that
> the project distributes as anything else. It's not just correctness, but
> liability. The hypothetical is always that someone included copyrighted
> text or something without permission and now the project can't rely on the
> argument that it made a good-faith effort to review what it released on the
> site. Someone becomes personally liable.
>
> These are pretty technical reasons though. More practically, what's the
> hurry to release if docs aren't done (_if_ they're not done)? It's being
> presented as normal practice, but seems quite exceptional.
>
>
>
>> I personally don't think the QA umbrella JIRAs are particularly
>> effective, but I also wouldn't ban their use if others think they are.
>> However, I do think that real QA needs an RC to test, so I think it is fine
>> that there is still outstanding QA to be done when an RC is cut.  For
>> example, I plan to run a bunch of streaming workloads on RC4 and will vote
>> accordingly.
>>
>
> QA on RCs is great (see above). The problem is, I can't distinguish
> between a JIRA that means "we must test in general", which sounds like
> something you too would ignore, and one that means "there is specific
> functionality we have to check before a release that we haven't looked at
> yet", which is a committer waving a flag that they implicitly do not want a
> release until resolved. I wouldn't +1 a release that had a Blocker software
> defect one of us reported.
>
> I know I'm harping on this, but this is the one mechanism we do use
> consistently (Blocker JIRAs) to clearly 

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Nick Pentreath
The website updates for ML QA (SPARK-20507) are not *actually* critical as
the project website certainly can be updated separately from the source
code guide and is not part of the release to be voted on. In future that
particular work item for the QA process could be marked down in priority,
and is definitely not a release blocker.

In any event I just resolved SPARK-20507, as I don't believe any website
updates are required for this release anyway. That fully resolves the ML QA
umbrella (SPARK-20499).


On Tue, 6 Jun 2017 at 10:16 Sean Owen  wrote:

> On Tue, Jun 6, 2017 at 1:06 AM Michael Armbrust 
> wrote:
>
>> Regarding the readiness of this and previous RCs.  I did cut RC1 & RC2
>> knowing that they were unlikely to pass.  That said, I still think these
>> early RCs are valuable. I know several users that wanted to test new
>> features in 2.2 that have used them.  Now, if we would prefer to call them
>> preview or RC0 or something I'd be okay with that as well.
>>
>
> They are valuable, I only suggest it's better to note explicitly when
> there are blockers or must-do tasks that will fail the RC. It makes a big
> difference to whether one would like to +1.
>
> I meant more than just calling them something different. An early RC could
> be voted as a released 'preview' artifact, at the start of the notional QA
> period, with a lower bar to passing, and releasable with known issues. This
> encourages more testing. It also resolves the controversy about whether
> it's OK to include an RC in a product (separate thread).
>
>
> Regarding doc updates, I don't think it is a requirement that they be
>> voted on as part of the release.  Even if they are something version
>> specific.  I think we have regularly updated the website with documentation
>> that was merged after the release.
>>
>
> They're part of the source release too, as markdown, and should be voted
> on. I've never understood otherwise. Have we actually released docs and
> then later changed them, so that they don't match the release? I don't
> recall that, but I do recall updating the non-version-specific website.
>
> Aside from the oddity of having docs generated from x.y source not match
> docs published for x.y, you want the same protections for doc source that
> the project distributes as anything else. It's not just correctness, but
> liability. The hypothetical is always that someone included copyrighted
> text or something without permission and now the project can't rely on the
> argument that it made a good-faith effort to review what it released on the
> site. Someone becomes personally liable.
>
> These are pretty technical reasons though. More practically, what's the
> hurry to release if docs aren't done (_if_ they're not done)? It's being
> presented as normal practice, but seems quite exceptional.
>
>
>
>> I personally don't think the QA umbrella JIRAs are particularly
>> effective, but I also wouldn't ban their use if others think they are.
>> However, I do think that real QA needs an RC to test, so I think it is fine
>> that there is still outstanding QA to be done when an RC is cut.  For
>> example, I plan to run a bunch of streaming workloads on RC4 and will vote
>> accordingly.
>>
>
> QA on RCs is great (see above). The problem is, I can't distinguish
> between a JIRA that means "we must test in general", which sounds like
> something you too would ignore, and one that means "there is specific
> functionality we have to check before a release that we haven't looked at
> yet", which is a committer waving a flag that they implicitly do not want a
> release until resolved. I wouldn't +1 a release that had a Blocker software
> defect one of us reported.
>
> I know I'm harping on this, but this is the one mechanism we do use
> consistently (Blocker JIRAs) to clearly communicate about issues vital to a
> go / no-go release decision, and I think this interferes. The rest of JIRA
> noise doesn't matter much. You can see we're already resorting to secondary
> communications as a result ("anyone have any issues that need to be fixed
> before I cut another RC?" emails) because this is kind of ignored, and
> think we're swapping out a decent mechanism for worse one.
>
> I suspect, as you do, that there's no to-do here in which case they should
> be resolved and we're still on track for release. I'd wait on +1 until then.
>
>


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Sean Owen
On Tue, Jun 6, 2017 at 1:06 AM Michael Armbrust 
wrote:

> Regarding the readiness of this and previous RCs.  I did cut RC1 & RC2
> knowing that they were unlikely to pass.  That said, I still think these
> early RCs are valuable. I know several users that wanted to test new
> features in 2.2 that have used them.  Now, if we would prefer to call them
> preview or RC0 or something I'd be okay with that as well.
>

They are valuable, I only suggest it's better to note explicitly when there
are blockers or must-do tasks that will fail the RC. It makes a big
difference to whether one would like to +1.

I meant more than just calling them something different. An early RC could
be voted as a released 'preview' artifact, at the start of the notional QA
period, with a lower bar to passing, and releasable with known issues. This
encourages more testing. It also resolves the controversy about whether
it's OK to include an RC in a product (separate thread).


Regarding doc updates, I don't think it is a requirement that they be voted
> on as part of the release.  Even if they are something version specific.  I
> think we have regularly updated the website with documentation that was
> merged after the release.
>

They're part of the source release too, as markdown, and should be voted
on. I've never understood otherwise. Have we actually released docs and
then later changed them, so that they don't match the release? I don't
recall that, but I do recall updating the non-version-specific website.

Aside from the oddity of having docs generated from x.y source not match
docs published for x.y, you want the same protections for doc source that
the project distributes as anything else. It's not just correctness, but
liability. The hypothetical is always that someone included copyrighted
text or something without permission and now the project can't rely on the
argument that it made a good-faith effort to review what it released on the
site. Someone becomes personally liable.

These are pretty technical reasons though. More practically, what's the
hurry to release if docs aren't done (_if_ they're not done)? It's being
presented as normal practice, but seems quite exceptional.



> I personally don't think the QA umbrella JIRAs are particularly effective,
> but I also wouldn't ban their use if others think they are.  However, I do
> think that real QA needs an RC to test, so I think it is fine that there is
> still outstanding QA to be done when an RC is cut.  For example, I plan to
> run a bunch of streaming workloads on RC4 and will vote accordingly.
>

QA on RCs is great (see above). The problem is, I can't distinguish between
a JIRA that means "we must test in general", which sounds like something
you too would ignore, and one that means "there is specific functionality
we have to check before a release that we haven't looked at yet", which is
a committer waving a flag that they implicitly do not want a release until
resolved. I wouldn't +1 a release that had a Blocker software defect one of
us reported.

I know I'm harping on this, but this is the one mechanism we do use
consistently (Blocker JIRAs) to clearly communicate about issues vital to a
go / no-go release decision, and I think this interferes. The rest of JIRA
noise doesn't matter much. You can see we're already resorting to secondary
communications as a result ("anyone have any issues that need to be fixed
before I cut another RC?" emails) because this is kind of ignored, and
think we're swapping out a decent mechanism for worse one.

I suspect, as you do, that there's no to-do here in which case they should
be resolved and we're still on track for release. I'd wait on +1 until then.


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Kazuaki Ishizaki
+1 (non-binding)

I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for 
core have passed.

$ java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 
1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
$ build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 
package install
$ build/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 test -pl core
...
Run completed in 15 minutes, 30 seconds.
Total number of tests run: 1959
Suites: completed 206, aborted 0
Tests: succeeded 1959, failed 0, canceled 4, ignored 8, pending 0
All tests passed.
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 17:16 min
[INFO] Finished at: 2017-06-06T13:44:48+09:00
[INFO] Final Memory: 53M/510M
[INFO] 

[WARNING] The requested profile "hive" could not be activated because it 
does not exist.

Kazuaki Ishizaki



From:   Michael Armbrust <mich...@databricks.com>
To: "dev@spark.apache.org" <dev@spark.apache.org>
Date:   2017/06/06 04:15
Subject:[VOTE] Apache Spark 2.2.0 (RC4)



Please vote on releasing the following candidate as Apache Spark version 
2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and 
passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc4 (
377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1241/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then 
reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked 
on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release 
unless the bug in question is a regression from 2.1.1.




Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Michael Armbrust
Apologies for messing up the https urls.  My mistake.  I'll try to get it
right next time.

Regarding the readiness of this and previous RCs.  I did cut RC1 & RC2
knowing that they were unlikely to pass.  That said, I still think these
early RCs are valuable. I know several users that wanted to test new
features in 2.2 that have used them.  Now, if we would prefer to call them
preview or RC0 or something I'd be okay with that as well.

Regarding doc updates, I don't think it is a requirement that they be voted
on as part of the release.  Even if they are something version specific.  I
think we have regularly updated the website with documentation that was
merged after the release.

I personally don't think the QA umbrella JIRAs are particularly effective,
but I also wouldn't ban their use if others think they are.  However, I do
think that real QA needs an RC to test, so I think it is fine that there is
still outstanding QA to be done when an RC is cut.  For example, I plan to
run a bunch of streaming workloads on RC4 and will vote accordingly.

TLDR; Based on what I have heard from everyone so far, there are currently
no know issues that should fail the vote here.  We should begin testing
RC4.  Thanks to everyone for your help!

On Mon, Jun 5, 2017 at 1:20 PM, Sean Owen  wrote:

> (I apologize for going on about this, but I've asked ~4 times: could you
> make the URLs here in the form email HTTPS URLs? It sounds minor, but we're
> asking people to verify the integrity of software and hashes, and this is
> the one case where it is actually important.)
>
> The "2.2" JIRAs don't look like updates to the non-version-specific web
> pages. If they affect release docs (i.e. under spark.apache.org/docs/),
> or the code, those QA/doc updates have to happen before a release. Right? I
> feel like this is self-evident but this comes up every minor release, that
> some testing or doc changes for a release can happen after the code and
> docs for the release are finalized. They obviously can't.
>
> I know, I get it. I think the reality is that the reporters don't believe
> there is something must-do for the 2.2.0 release, or else they'd have
> spoken up. In that case, these should be closed already as they're
> semantically "Blockers" and we shouldn't make an RC that can't pass.
>
> ... or should we? Actually, to me the idea of an "RC0" release as a
> preview, and RCs that are known to fail for testing purposes seem OK. But
> if that's the purpose here, let's say it.
>
> If the "QA" JIRAs just represent that 'we will test things, in general',
> then I think they're superfluous at best. These aren't used consistently,
> and their intent isn't actionable (i.e. it sounds like no particular
> testing resolves the JIRA). They signal something that doesn't seem to
> match the intent.
>
> Can we close the QA JIRAs -- and are there any actual must-have docs not
> already in the 2.2 branch?
>
> On Mon, Jun 5, 2017 at 8:52 PM Michael Armbrust 
> wrote:
>
>> I commented on that JIRA, I don't think that should block the release.
>> We can support both options long term if this vote passes.  Looks like the
>> remaining JIRAs are doc/website updates that can happen after the vote or
>> QA that should be done on this RC.  I think we are ready to start testing
>> this release seriously!
>>
>> On Mon, Jun 5, 2017 at 12:40 PM, Sean Owen  wrote:
>>
>>> Xiao opened a blocker on 2.2.0 this morning:
>>>
>>> SPARK-20980 Rename the option `wholeFile` to `multiLine` for JSON and CSV
>>>
>>> I don't see that this should block?
>>>
>>> We still have 7 Critical issues:
>>>
>>> SPARK-20520 R streaming tests failed on Windows
>>> SPARK-20512 SparkR 2.2 QA: Programming guide, migration guide, vignettes
>>> updates
>>> SPARK-20499 Spark MLlib, GraphX 2.2 QA umbrella
>>> SPARK-20508 Spark R 2.2 QA umbrella
>>> SPARK-20513 Update SparkR website for 2.2
>>> SPARK-20510 SparkR 2.2 QA: Update user guide for new features & APIs
>>> SPARK-20507 Update MLlib, GraphX websites for 2.2
>>>
>>> I'm going to assume that the R test issue isn't actually that big a
>>> deal, and that the 2.2 items are done. Anything that really is for 2.2
>>> needs to block the release; Joseph what's the status on those?
>>>
>>> On Mon, Jun 5, 2017 at 8:15 PM Michael Armbrust 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00
 PST and passes if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 2.2.0
 [ ] -1 Do not release this package because ...


 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v2.2.0-rc4
  (377cfa8ac7ff7a8
 a6a6d273182e18ea7dc25ce7e)

 List of JIRA tickets resolved can be found 

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Sean Owen
On the latest Ubuntu, Java 8, with -Phive -Phadoop-2.7 -Pyarn, this passes
all tests. It's looking good, pending a double-check on the outstanding
JIRA questions.

All the hashes and sigs are correct.

On Mon, Jun 5, 2017 at 8:15 PM Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.2.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.2.0-rc4
>  (
> 377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)
>
> List of JIRA tickets resolved can be found with this filter
> 
> .
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1241/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.2.0?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.1.
>


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Sean Owen
(I apologize for going on about this, but I've asked ~4 times: could you
make the URLs here in the form email HTTPS URLs? It sounds minor, but we're
asking people to verify the integrity of software and hashes, and this is
the one case where it is actually important.)

The "2.2" JIRAs don't look like updates to the non-version-specific web
pages. If they affect release docs (i.e. under spark.apache.org/docs/), or
the code, those QA/doc updates have to happen before a release. Right? I
feel like this is self-evident but this comes up every minor release, that
some testing or doc changes for a release can happen after the code and
docs for the release are finalized. They obviously can't.

I know, I get it. I think the reality is that the reporters don't believe
there is something must-do for the 2.2.0 release, or else they'd have
spoken up. In that case, these should be closed already as they're
semantically "Blockers" and we shouldn't make an RC that can't pass.

... or should we? Actually, to me the idea of an "RC0" release as a
preview, and RCs that are known to fail for testing purposes seem OK. But
if that's the purpose here, let's say it.

If the "QA" JIRAs just represent that 'we will test things, in general',
then I think they're superfluous at best. These aren't used consistently,
and their intent isn't actionable (i.e. it sounds like no particular
testing resolves the JIRA). They signal something that doesn't seem to
match the intent.

Can we close the QA JIRAs -- and are there any actual must-have docs not
already in the 2.2 branch?

On Mon, Jun 5, 2017 at 8:52 PM Michael Armbrust 
wrote:

> I commented on that JIRA, I don't think that should block the release.  We
> can support both options long term if this vote passes.  Looks like the
> remaining JIRAs are doc/website updates that can happen after the vote or
> QA that should be done on this RC.  I think we are ready to start testing
> this release seriously!
>
> On Mon, Jun 5, 2017 at 12:40 PM, Sean Owen  wrote:
>
>> Xiao opened a blocker on 2.2.0 this morning:
>>
>> SPARK-20980 Rename the option `wholeFile` to `multiLine` for JSON and CSV
>>
>> I don't see that this should block?
>>
>> We still have 7 Critical issues:
>>
>> SPARK-20520 R streaming tests failed on Windows
>> SPARK-20512 SparkR 2.2 QA: Programming guide, migration guide, vignettes
>> updates
>> SPARK-20499 Spark MLlib, GraphX 2.2 QA umbrella
>> SPARK-20508 Spark R 2.2 QA umbrella
>> SPARK-20513 Update SparkR website for 2.2
>> SPARK-20510 SparkR 2.2 QA: Update user guide for new features & APIs
>> SPARK-20507 Update MLlib, GraphX websites for 2.2
>>
>> I'm going to assume that the R test issue isn't actually that big a deal,
>> and that the 2.2 items are done. Anything that really is for 2.2 needs to
>> block the release; Joseph what's the status on those?
>>
>> On Mon, Jun 5, 2017 at 8:15 PM Michael Armbrust 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00
>>> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.2.0-rc4
>>>  (
>>> 377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)
>>>
>>> List of JIRA tickets resolved can be found with this filter
>>> 
>>> .
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1241/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/
>>>
>>>
>>> *FAQ*
>>>
>>> *How can I help test this release?*
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should be
>>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>>
>>> *But my bug isn't fixed!??!*
>>>
>>> In order to make timely releases, we will typically not hold the release
>>> unless the bug in question is a regression from 2.1.1.
>>>

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Dong Joon Hyun
Hi, Michael.

Can we be more clear on deprecation messages in 2.2.0-RC4 documentation?

> Spark runs on Java 8+, Python 2.6+/3.4+ and R 3.1+.
-> Python 2.7+ ?
https://issues.apache.org/jira/browse/SPARK-12661  (Status: `Open`, Target 
Version: `2.2.0`, Label: `ReleaseNotes`)

> Note that support for Python 2.6 is deprecated as of Spark 2.0.0, and support 
> for Scala 2.10 and versions of Hadoop before 2.6 are deprecated as of Spark 
> 2.1.0, and may be removed in Spark 2.2.0.
-> Support for versions of Hadoop before 2.6.5 are removed as of 2.2.0.
-> Support for Scala 2.10 may be removed in Spark 2.3.0.

Since this is a doc only issue, can we revise this without affecting the RC4 
vote?

I created a PR for this, https://github.com/apache/spark/pull/18207.

Bests,
Dongjoon.


From: Michael Armbrust <mich...@databricks.com>
Date: Monday, June 5, 2017 at 12:51 PM
To: Sean Owen <so...@cloudera.com>
Cc: "dev@spark.apache.org" <dev@spark.apache.org>
Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)

I commented on that JIRA, I don't think that should block the release.  We can 
support both options long term if this vote passes.  Looks like the remaining 
JIRAs are doc/website updates that can happen after the vote or QA that should 
be done on this RC.  I think we are ready to start testing this release 
seriously!

On Mon, Jun 5, 2017 at 12:40 PM, Sean Owen 
<so...@cloudera.com<mailto:so...@cloudera.com>> wrote:
Xiao opened a blocker on 2.2.0 this morning:

SPARK-20980 Rename the option `wholeFile` to `multiLine` for JSON and CSV

I don't see that this should block?

We still have 7 Critical issues:

SPARK-20520 R streaming tests failed on Windows
SPARK-20512 SparkR 2.2 QA: Programming guide, migration guide, vignettes updates
SPARK-20499 Spark MLlib, GraphX 2.2 QA umbrella
SPARK-20508 Spark R 2.2 QA umbrella
SPARK-20513 Update SparkR website for 2.2
SPARK-20510 SparkR 2.2 QA: Update user guide for new features & APIs
SPARK-20507 Update MLlib, GraphX websites for 2.2

I'm going to assume that the R test issue isn't actually that big a deal, and 
that the 2.2 items are done. Anything that really is for 2.2 needs to block the 
release; Joseph what's the status on those?

On Mon, Jun 5, 2017 at 8:15 PM Michael Armbrust 
<mich...@databricks.com<mailto:mich...@databricks.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. 
The vote is open until Thurs, June 8th, 2017 at 12:00 PST and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is 
v2.2.0-rc4<https://github.com/apache/spark/tree/v2.2.0-rc4> 
(377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)

List of JIRA tickets resolved can be found with this 
filter<https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0>.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1241/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked on 
immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless 
the bug in question is a regression from 2.1.1.



Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Michael Armbrust
I commented on that JIRA, I don't think that should block the release.  We
can support both options long term if this vote passes.  Looks like the
remaining JIRAs are doc/website updates that can happen after the vote or
QA that should be done on this RC.  I think we are ready to start testing
this release seriously!

On Mon, Jun 5, 2017 at 12:40 PM, Sean Owen  wrote:

> Xiao opened a blocker on 2.2.0 this morning:
>
> SPARK-20980 Rename the option `wholeFile` to `multiLine` for JSON and CSV
>
> I don't see that this should block?
>
> We still have 7 Critical issues:
>
> SPARK-20520 R streaming tests failed on Windows
> SPARK-20512 SparkR 2.2 QA: Programming guide, migration guide, vignettes
> updates
> SPARK-20499 Spark MLlib, GraphX 2.2 QA umbrella
> SPARK-20508 Spark R 2.2 QA umbrella
> SPARK-20513 Update SparkR website for 2.2
> SPARK-20510 SparkR 2.2 QA: Update user guide for new features & APIs
> SPARK-20507 Update MLlib, GraphX websites for 2.2
>
> I'm going to assume that the R test issue isn't actually that big a deal,
> and that the 2.2 items are done. Anything that really is for 2.2 needs to
> block the release; Joseph what's the status on those?
>
> On Mon, Jun 5, 2017 at 8:15 PM Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.2.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.2.0-rc4
>>  (377cfa8ac7ff7a8
>> a6a6d273182e18ea7dc25ce7e)
>>
>> List of JIRA tickets resolved can be found with this filter
>> 
>> .
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1241/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.1.
>>
>


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Sean Owen
Xiao opened a blocker on 2.2.0 this morning:

SPARK-20980 Rename the option `wholeFile` to `multiLine` for JSON and CSV

I don't see that this should block?

We still have 7 Critical issues:

SPARK-20520 R streaming tests failed on Windows
SPARK-20512 SparkR 2.2 QA: Programming guide, migration guide, vignettes
updates
SPARK-20499 Spark MLlib, GraphX 2.2 QA umbrella
SPARK-20508 Spark R 2.2 QA umbrella
SPARK-20513 Update SparkR website for 2.2
SPARK-20510 SparkR 2.2 QA: Update user guide for new features & APIs
SPARK-20507 Update MLlib, GraphX websites for 2.2

I'm going to assume that the R test issue isn't actually that big a deal,
and that the 2.2 items are done. Anything that really is for 2.2 needs to
block the release; Joseph what's the status on those?

On Mon, Jun 5, 2017 at 8:15 PM Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.2.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.2.0-rc4
>  (
> 377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)
>
> List of JIRA tickets resolved can be found with this filter
> 
> .
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1241/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.2.0?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.1.
>


[VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version
2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.2.0-rc4
 (
377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)

List of JIRA tickets resolved can be found with this filter

.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1241/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/


*FAQ*

*How can I help test this release?*

If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions.

*What should happen to JIRA tickets still targeting 2.2.0?*

Committers should look at those and triage. Extremely important bug fixes,
documentation, and API tweaks that impact compatibility should be worked on
immediately. Everything else please retarget to 2.3.0 or 2.2.1.

*But my bug isn't fixed!??!*

In order to make timely releases, we will typically not hold the release
unless the bug in question is a regression from 2.1.1.