Sounds good. Think we checked and should be good to go. Appreciated.
________________________________ From: Michael Armbrust <mich...@databricks.com> Sent: Wednesday, June 14, 2017 4:51:48 PM To: Hyukjin Kwon Cc: Felix Cheung; Nick Pentreath; dev; Sean Owen Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) So, it looks like SPARK-21085<https://issues.apache.org/jira/browse/SPARK-21085> has been fixed and SPARK-21093<https://issues.apache.org/jira/browse/SPARK-21093> is not a regression. Last call before I cut RC5. On Wed, Jun 14, 2017 at 2:28 AM, Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>> wrote: Actually, I opened - https://issues.apache.org/jira/browse/SPARK-21093. 2017-06-14 17:08 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>>: For a shorter reproducer ... df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) collect(gapply(df, "a", function(key, x) { x }, schema(df))) And running the below multiple times (5~7): collect(gapply(df, "a", function(key, x) { x }, schema(df))) looks occasionally throwing an error. I will leave here and probably explain more information if a JIRA is open. This does not look a regression anyway. 2017-06-14 16:22 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>>: Per https://github.com/apache/spark/tree/v2.1.1, 1. CentOS 7.2.1511 / R 3.3.3 - this test hangs. I messed it up a bit while downgrading the R to 3.3.3 (It was an actual machine not a VM) so it took me a while to re-try this. I re-built this again and checked the R version is 3.3.3 at least. I hope this one could double checked. Here is the self-reproducer: irisDF <- suppressWarnings(createDataFrame (iris)) schema <- structType(structField("Sepal_Length", "double"), structField("Avg", "double")) df4 <- gapply( cols = "Sepal_Length", irisDF, function(key, x) { y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE) }, schema) collect(df4) 2017-06-14 16:07 GMT+09:00 Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>: Thanks! Will try to setup RHEL/CentOS to test it out _____________________________ From: Nick Pentreath <nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>> Sent: Tuesday, June 13, 2017 11:38 PM Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>, Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>>, dev <dev@spark.apache.org<mailto:dev@spark.apache.org>> Cc: Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>> Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have to report back later with the versions as am AFK. R version not totally sure but again will revert asap On Wed, 14 Jun 2017 at 05:09, Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote: Thanks This was with an external package and unrelated >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845) As for CentOS - would it be possible to test against R older than 3.4.0? This is the same error reported by Nick below. _____________________________ From: Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>> Sent: Tuesday, June 13, 2017 8:02 PM Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) To: dev <dev@spark.apache.org<mailto:dev@spark.apache.org>> Cc: Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>>, Nick Pentreath <nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>>, Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> For the test failure on R, I checked: Per https://github.com/apache/spark/tree/v2.2.0-rc4, 1. Windows Server 2012 R2 / R 3.3.1 - passed (https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4) 2. macOS Sierra 10.12.3 / R 3.4.0 - passed 3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845) 4. CentOS 7.2.1511 / R 3.4.0 - reproduced (https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d) Per https://github.com/apache/spark/tree/v2.1.1, 1. CentOS 7.2.1511 / R 3.4.0 - reproduced (https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301) This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my tests and observations. This is failed in Spark 2.1.1. So, it sounds not a regression although it is a bug that should be fixed (whether in Spark or R). 2017-06-14 8:28 GMT+09:00 Xiao Li <gatorsm...@gmail.com<mailto:gatorsm...@gmail.com>>: -1 Spark 2.2 is unable to read the partitioned table created by Spark 2.1 or earlier. Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085 Will fix it soon. Thanks, Xiao Li 2017-06-13 9:39 GMT-07:00 Joseph Bradley <jos...@databricks.com<mailto:jos...@databricks.com>>: Re: the QA JIRAs: Thanks for discussing them. I still feel they are very helpful; I particularly notice not having to spend a solid 2-3 weeks of time QAing (unlike in earlier Spark releases). One other point not mentioned above: I think they serve as a very helpful reminder/training for the community for rigor in development. Since we instituted QA JIRAs, contributors have been a lot better about adding in docs early, rather than waiting until the end of the cycle (though I know this is drawing conclusions from correlations). I would vote in favor of the RC...but I'll wait to see about the reported failures. On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>> wrote: Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but that's also reporting R test failures. I went back and tried to run the R tests and they passed, at least on Ubuntu 17 / R 3.3. On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath <nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>> wrote: All Scala, Python tests pass. ML QA and doc issues are resolved (as well as R it seems). However, I'm seeing the following test failure on R consistently: https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72 On Thu, 8 Jun 2017 at 08:48 Denny Lee <denny.g....@gmail.com<mailto:denny.g....@gmail.com>> wrote: +1 non-binding Tested on macOS Sierra, Ubuntu 16.04 test suite includes various test cases including Spark SQL, ML, GraphFrames, Structured Streaming On Wed, Jun 7, 2017 at 9:40 PM vaquar khan <vaquar.k...@gmail.com<mailto:vaquar.k...@gmail.com>> wrote: +1 non-binding Regards, vaquar khan On Jun 7, 2017 4:32 PM, "Ricardo Almeida" <ricardo.alme...@actnowib.com<mailto:ricardo.alme...@actnowib.com>> wrote: +1 (non-binding) Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive -Phive-thriftserver -Pscala-2.11 on * Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111) * macOS 10.12.5 Java 8 (build 1.8.0_131) On 5 June 2017 at 21:14, Michael Armbrust <mich...@databricks.com<mailto:mich...@databricks.com>> wrote: Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ The tag to be voted on is v2.2.0-rc4<https://github.com/apache/spark/tree/v2.2.0-rc4> (377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e) List of JIRA tickets resolved can be found with this filter<https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0>. The release files, including signatures, digests, etc. can be found at: http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1241/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/ FAQ How can I help test this release? If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions. What should happen to JIRA tickets still targeting 2.2.0? Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1. But my bug isn't fixed!??! In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from 2.1.1. -- Joseph Bradley Software Engineer - Machine Learning Databricks, Inc. [http://databricks.com]<http://databricks.com/>