Re: [VOTE] Apache Spark 2.2.0 (RC4)

Felix Cheung Thu, 15 Jun 2017 09:36:07 -0700

Sounds good.

Think we checked and should be good to go. Appreciated.

________________________________
From: Michael Armbrust <mich...@databricks.com>
Sent: Wednesday, June 14, 2017 4:51:48 PM
To: Hyukjin Kwon
Cc: Felix Cheung; Nick Pentreath; dev; Sean Owen
Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)

So, it looks like 
SPARK-21085<https://issues.apache.org/jira/browse/SPARK-21085> has been fixed 
and SPARK-21093<https://issues.apache.org/jira/browse/SPARK-21093> is not a 
regression.  Last call before I cut RC5.

On Wed, Jun 14, 2017 at 2:28 AM, Hyukjin Kwon 
<gurwls...@gmail.com<mailto:gurwls...@gmail.com>> wrote:
Actually, I opened - https://issues.apache.org/jira/browse/SPARK-21093.

2017-06-14 17:08 GMT+09:00 Hyukjin Kwon 
<gurwls...@gmail.com<mailto:gurwls...@gmail.com>>:
For a shorter reproducer ...

df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
collect(gapply(df, "a", function(key, x) { x }, schema(df)))

And running the below multiple times (5~7):

collect(gapply(df, "a", function(key, x) { x }, schema(df)))

looks occasionally throwing an error.

I will leave here and probably explain more information if a JIRA is open. This 
does not look a regression anyway.

2017-06-14 16:22 GMT+09:00 Hyukjin Kwon 
<gurwls...@gmail.com<mailto:gurwls...@gmail.com>>:

Per https://github.com/apache/spark/tree/v2.1.1,

1. CentOS 7.2.1511 / R 3.3.3 - this test hangs.

I messed it up a bit while downgrading the R to 3.3.3 (It was an actual machine 
not a VM) so it took me a while to re-try this.
I re-built this again and checked the R version is 3.3.3 at least. I hope this 
one could double checked.

Here is the self-reproducer:

irisDF <- suppressWarnings(createDataFrame (iris))
schema <-  structType(structField("Sepal_Length", "double"), structField("Avg", 
"double"))
df4 <- gapply(
  cols = "Sepal_Length",
  irisDF,
  function(key, x) {
    y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE)
  },
  schema)
collect(df4)

2017-06-14 16:07 GMT+09:00 Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>:
Thanks! Will try to setup RHEL/CentOS to test it out

_____________________________
From: Nick Pentreath <nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>>
Sent: Tuesday, June 13, 2017 11:38 PM
Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>, 
Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>>, dev 
<dev@spark.apache.org<mailto:dev@spark.apache.org>>

Cc: Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>>

Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have to 
report back later with the versions as am AFK.

R version not totally sure but again will revert asap
On Wed, 14 Jun 2017 at 05:09, Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote:
Thanks
This was with an external package and unrelated

  >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning 
(https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)

As for CentOS - would it be possible to test against R older than 3.4.0? This 
is the same error reported by Nick below.

_____________________________
From: Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>>
Sent: Tuesday, June 13, 2017 8:02 PM

Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
To: dev <dev@spark.apache.org<mailto:dev@spark.apache.org>>
Cc: Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>>, Nick Pentreath 
<nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>>, Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>

For the test failure on R, I checked:

Per https://github.com/apache/spark/tree/v2.2.0-rc4,

1. Windows Server 2012 R2 / R 3.3.1 - passed 
(https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4)
2. macOS Sierra 10.12.3 / R 3.4.0 - passed
3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning 
(https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
4. CentOS 7.2.1511 / R 3.4.0 - reproduced 
(https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d)

Per https://github.com/apache/spark/tree/v2.1.1,

1. CentOS 7.2.1511 / R 3.4.0 - reproduced 
(https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301)

This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my tests and 
observations.

This is failed in Spark 2.1.1. So, it sounds not a regression although it is a 
bug that should be fixed (whether in Spark or R).

2017-06-14 8:28 GMT+09:00 Xiao Li 
<gatorsm...@gmail.com<mailto:gatorsm...@gmail.com>>:
-1

Spark 2.2 is unable to read the partitioned table created by Spark 2.1 or 
earlier.

Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085

Will fix it soon.

Thanks,

Xiao Li

2017-06-13 9:39 GMT-07:00 Joseph Bradley 
<jos...@databricks.com<mailto:jos...@databricks.com>>:
Re: the QA JIRAs:
Thanks for discussing them.  I still feel they are very helpful; I particularly 
notice not having to spend a solid 2-3 weeks of time QAing (unlike in earlier 
Spark releases).  One other point not mentioned above: I think they serve as a 
very helpful reminder/training for the community for rigor in development.  
Since we instituted QA JIRAs, contributors have been a lot better about adding 
in docs early, rather than waiting until the end of the cycle (though I know 
this is drawing conclusions from correlations).

I would vote in favor of the RC...but I'll wait to see about the reported 
failures.

On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen 
<so...@cloudera.com<mailto:so...@cloudera.com>> wrote:
Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but 
that's also reporting R test failures.

I went back and tried to run the R tests and they passed, at least on Ubuntu 17 
/ R 3.3.

On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath 
<nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>> wrote:
All Scala, Python tests pass. ML QA and doc issues are resolved (as well as R 
it seems).

However, I'm seeing the following test failure on R consistently: 
https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72

On Thu, 8 Jun 2017 at 08:48 Denny Lee 
<denny.g....@gmail.com<mailto:denny.g....@gmail.com>> wrote:
+1 non-binding

Tested on macOS Sierra, Ubuntu 16.04
test suite includes various test cases including Spark SQL, ML, GraphFrames, 
Structured Streaming

On Wed, Jun 7, 2017 at 9:40 PM vaquar khan 
<vaquar.k...@gmail.com<mailto:vaquar.k...@gmail.com>> wrote:
+1 non-binding

Regards,
vaquar khan

On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
<ricardo.alme...@actnowib.com<mailto:ricardo.alme...@actnowib.com>> wrote:
+1 (non-binding)

Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive 
-Phive-thriftserver -Pscala-2.11 on

  *   Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
  *   macOS 10.12.5 Java 8 (build 1.8.0_131)

On 5 June 2017 at 21:14, Michael Armbrust 
<mich...@databricks.com<mailto:mich...@databricks.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. 
The vote is open until Thurs, June 8th, 2017 at 12:00 PST and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is 
v2.2.0-rc4<https://github.com/apache/spark/tree/v2.2.0-rc4> 
(377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)

List of JIRA tickets resolved can be found with this 
filter<https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0>.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1241/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/

FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked on 
immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless 
the bug in question is a regression from 2.1.1.

--

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[http://databricks.com]<http://databricks.com/>

Re: [VOTE] Apache Spark 2.2.0 (RC4)

Reply via email to