Sounds good. Think we checked and should be good to go. Appreciated.
________________________________ From: Michael Armbrust <[email protected]> Sent: Wednesday, June 14, 2017 4:51:48 PM To: Hyukjin Kwon Cc: Felix Cheung; Nick Pentreath; dev; Sean Owen Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) So, it looks like SPARK-21085<https://issues.apache.org/jira/browse/SPARK-21085> has been fixed and SPARK-21093<https://issues.apache.org/jira/browse/SPARK-21093> is not a regression. Last call before I cut RC5. On Wed, Jun 14, 2017 at 2:28 AM, Hyukjin Kwon <[email protected]<mailto:[email protected]>> wrote: Actually, I opened - https://issues.apache.org/jira/browse/SPARK-21093. 2017-06-14 17:08 GMT+09:00 Hyukjin Kwon <[email protected]<mailto:[email protected]>>: For a shorter reproducer ... df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) collect(gapply(df, "a", function(key, x) { x }, schema(df))) And running the below multiple times (5~7): collect(gapply(df, "a", function(key, x) { x }, schema(df))) looks occasionally throwing an error. I will leave here and probably explain more information if a JIRA is open. This does not look a regression anyway. 2017-06-14 16:22 GMT+09:00 Hyukjin Kwon <[email protected]<mailto:[email protected]>>: Per https://github.com/apache/spark/tree/v2.1.1, 1. CentOS 7.2.1511 / R 3.3.3 - this test hangs. I messed it up a bit while downgrading the R to 3.3.3 (It was an actual machine not a VM) so it took me a while to re-try this. I re-built this again and checked the R version is 3.3.3 at least. I hope this one could double checked. Here is the self-reproducer: irisDF <- suppressWarnings(createDataFrame (iris)) schema <- structType(structField("Sepal_Length", "double"), structField("Avg", "double")) df4 <- gapply( cols = "Sepal_Length", irisDF, function(key, x) { y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE) }, schema) collect(df4) 2017-06-14 16:07 GMT+09:00 Felix Cheung <[email protected]<mailto:[email protected]>>: Thanks! Will try to setup RHEL/CentOS to test it out _____________________________ From: Nick Pentreath <[email protected]<mailto:[email protected]>> Sent: Tuesday, June 13, 2017 11:38 PM Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) To: Felix Cheung <[email protected]<mailto:[email protected]>>, Hyukjin Kwon <[email protected]<mailto:[email protected]>>, dev <[email protected]<mailto:[email protected]>> Cc: Sean Owen <[email protected]<mailto:[email protected]>> Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have to report back later with the versions as am AFK. R version not totally sure but again will revert asap On Wed, 14 Jun 2017 at 05:09, Felix Cheung <[email protected]<mailto:[email protected]>> wrote: Thanks This was with an external package and unrelated >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845) As for CentOS - would it be possible to test against R older than 3.4.0? This is the same error reported by Nick below. _____________________________ From: Hyukjin Kwon <[email protected]<mailto:[email protected]>> Sent: Tuesday, June 13, 2017 8:02 PM Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) To: dev <[email protected]<mailto:[email protected]>> Cc: Sean Owen <[email protected]<mailto:[email protected]>>, Nick Pentreath <[email protected]<mailto:[email protected]>>, Felix Cheung <[email protected]<mailto:[email protected]>> For the test failure on R, I checked: Per https://github.com/apache/spark/tree/v2.2.0-rc4, 1. Windows Server 2012 R2 / R 3.3.1 - passed (https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4) 2. macOS Sierra 10.12.3 / R 3.4.0 - passed 3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845) 4. CentOS 7.2.1511 / R 3.4.0 - reproduced (https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d) Per https://github.com/apache/spark/tree/v2.1.1, 1. CentOS 7.2.1511 / R 3.4.0 - reproduced (https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301) This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my tests and observations. This is failed in Spark 2.1.1. So, it sounds not a regression although it is a bug that should be fixed (whether in Spark or R). 2017-06-14 8:28 GMT+09:00 Xiao Li <[email protected]<mailto:[email protected]>>: -1 Spark 2.2 is unable to read the partitioned table created by Spark 2.1 or earlier. Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085 Will fix it soon. Thanks, Xiao Li 2017-06-13 9:39 GMT-07:00 Joseph Bradley <[email protected]<mailto:[email protected]>>: Re: the QA JIRAs: Thanks for discussing them. I still feel they are very helpful; I particularly notice not having to spend a solid 2-3 weeks of time QAing (unlike in earlier Spark releases). One other point not mentioned above: I think they serve as a very helpful reminder/training for the community for rigor in development. Since we instituted QA JIRAs, contributors have been a lot better about adding in docs early, rather than waiting until the end of the cycle (though I know this is drawing conclusions from correlations). I would vote in favor of the RC...but I'll wait to see about the reported failures. On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen <[email protected]<mailto:[email protected]>> wrote: Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but that's also reporting R test failures. I went back and tried to run the R tests and they passed, at least on Ubuntu 17 / R 3.3. On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath <[email protected]<mailto:[email protected]>> wrote: All Scala, Python tests pass. ML QA and doc issues are resolved (as well as R it seems). However, I'm seeing the following test failure on R consistently: https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72 On Thu, 8 Jun 2017 at 08:48 Denny Lee <[email protected]<mailto:[email protected]>> wrote: +1 non-binding Tested on macOS Sierra, Ubuntu 16.04 test suite includes various test cases including Spark SQL, ML, GraphFrames, Structured Streaming On Wed, Jun 7, 2017 at 9:40 PM vaquar khan <[email protected]<mailto:[email protected]>> wrote: +1 non-binding Regards, vaquar khan On Jun 7, 2017 4:32 PM, "Ricardo Almeida" <[email protected]<mailto:[email protected]>> wrote: +1 (non-binding) Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive -Phive-thriftserver -Pscala-2.11 on * Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111) * macOS 10.12.5 Java 8 (build 1.8.0_131) On 5 June 2017 at 21:14, Michael Armbrust <[email protected]<mailto:[email protected]>> wrote: Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ The tag to be voted on is v2.2.0-rc4<https://github.com/apache/spark/tree/v2.2.0-rc4> (377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e) List of JIRA tickets resolved can be found with this filter<https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0>. The release files, including signatures, digests, etc. can be found at: http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1241/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/ FAQ How can I help test this release? If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions. What should happen to JIRA tickets still targeting 2.2.0? Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1. But my bug isn't fixed!??! In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1. -- Joseph Bradley Software Engineer - Machine Learning Databricks, Inc. [http://databricks.com]<http://databricks.com/>
