Thanks, I added the details of my environment to the JIRA (for what it's worth now, as the issue is identified)
On Wed, 14 Jun 2017 at 11:28 Hyukjin Kwon <gurwls...@gmail.com> wrote: > Actually, I opened - https://issues.apache.org/jira/browse/SPARK-21093. > > 2017-06-14 17:08 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>: > >> For a shorter reproducer ... >> >> >> df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) >> collect(gapply(df, "a", function(key, x) { x }, schema(df))) >> >> And running the below multiple times (5~7): >> >> collect(gapply(df, "a", function(key, x) { x }, schema(df))) >> >> looks occasionally throwing an error. >> >> >> I will leave here and probably explain more information if a JIRA is >> open. This does not look a regression anyway. >> >> >> >> 2017-06-14 16:22 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>: >> >>> >>> Per https://github.com/apache/spark/tree/v2.1.1, >>> >>> 1. CentOS 7.2.1511 / R 3.3.3 - this test hangs. >>> >>> I messed it up a bit while downgrading the R to 3.3.3 (It was an actual >>> machine not a VM) so it took me a while to re-try this. >>> I re-built this again and checked the R version is 3.3.3 at least. I >>> hope this one could double checked. >>> >>> Here is the self-reproducer: >>> >>> irisDF <- suppressWarnings(createDataFrame (iris)) >>> schema <- structType(structField("Sepal_Length", "double"), >>> structField("Avg", "double")) >>> df4 <- gapply( >>> cols = "Sepal_Length", >>> irisDF, >>> function(key, x) { >>> y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE) >>> }, >>> schema) >>> collect(df4) >>> >>> >>> >>> 2017-06-14 16:07 GMT+09:00 Felix Cheung <felixcheun...@hotmail.com>: >>> >>>> Thanks! Will try to setup RHEL/CentOS to test it out >>>> >>>> _____________________________ >>>> From: Nick Pentreath <nick.pentre...@gmail.com> >>>> Sent: Tuesday, June 13, 2017 11:38 PM >>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) >>>> To: Felix Cheung <felixcheun...@hotmail.com>, Hyukjin Kwon < >>>> gurwls...@gmail.com>, dev <dev@spark.apache.org> >>>> >>>> Cc: Sean Owen <so...@cloudera.com> >>>> >>>> >>>> Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have >>>> to report back later with the versions as am AFK. >>>> >>>> R version not totally sure but again will revert asap >>>> On Wed, 14 Jun 2017 at 05:09, Felix Cheung <felixcheun...@hotmail.com> >>>> wrote: >>>> >>>>> Thanks >>>>> This was with an external package and unrelated >>>>> >>>>> >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning ( >>>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845) >>>>> >>>>> As for CentOS - would it be possible to test against R older than >>>>> 3.4.0? This is the same error reported by Nick below. >>>>> >>>>> _____________________________ >>>>> From: Hyukjin Kwon <gurwls...@gmail.com> >>>>> Sent: Tuesday, June 13, 2017 8:02 PM >>>>> >>>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) >>>>> To: dev <dev@spark.apache.org> >>>>> Cc: Sean Owen <so...@cloudera.com>, Nick Pentreath < >>>>> nick.pentre...@gmail.com>, Felix Cheung <felixcheun...@hotmail.com> >>>>> >>>>> >>>>> >>>>> For the test failure on R, I checked: >>>>> >>>>> >>>>> Per https://github.com/apache/spark/tree/v2.2.0-rc4, >>>>> >>>>> 1. Windows Server 2012 R2 / R 3.3.1 - passed ( >>>>> https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4 >>>>> ) >>>>> 2. macOS Sierra 10.12.3 / R 3.4.0 - passed >>>>> 3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning ( >>>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845) >>>>> 4. CentOS 7.2.1511 / R 3.4.0 - reproduced ( >>>>> https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d) >>>>> >>>>> >>>>> Per https://github.com/apache/spark/tree/v2.1.1, >>>>> >>>>> 1. CentOS 7.2.1511 / R 3.4.0 - reproduced ( >>>>> https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301) >>>>> >>>>> >>>>> This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my >>>>> tests and observations. >>>>> >>>>> This is failed in Spark 2.1.1. So, it sounds not a regression although >>>>> it is a bug that should be fixed (whether in Spark or R). >>>>> >>>>> >>>>> 2017-06-14 8:28 GMT+09:00 Xiao Li <gatorsm...@gmail.com>: >>>>> >>>>>> -1 >>>>>> >>>>>> Spark 2.2 is unable to read the partitioned table created by Spark >>>>>> 2.1 or earlier. >>>>>> >>>>>> Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085 >>>>>> >>>>>> Will fix it soon. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Xiao Li >>>>>> >>>>>> >>>>>> >>>>>> 2017-06-13 9:39 GMT-07:00 Joseph Bradley <jos...@databricks.com>: >>>>>> >>>>>>> Re: the QA JIRAs: >>>>>>> Thanks for discussing them. I still feel they are very helpful; I >>>>>>> particularly notice not having to spend a solid 2-3 weeks of time QAing >>>>>>> (unlike in earlier Spark releases). One other point not mentioned >>>>>>> above: I >>>>>>> think they serve as a very helpful reminder/training for the community >>>>>>> for >>>>>>> rigor in development. Since we instituted QA JIRAs, contributors have >>>>>>> been >>>>>>> a lot better about adding in docs early, rather than waiting until the >>>>>>> end >>>>>>> of the cycle (though I know this is drawing conclusions from >>>>>>> correlations). >>>>>>> >>>>>>> I would vote in favor of the RC...but I'll wait to see about the >>>>>>> reported failures. >>>>>>> >>>>>>> On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen <so...@cloudera.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Different errors as in >>>>>>>> https://issues.apache.org/jira/browse/SPARK-20520 but that's also >>>>>>>> reporting R test failures. >>>>>>>> >>>>>>>> I went back and tried to run the R tests and they passed, at least >>>>>>>> on Ubuntu 17 / R 3.3. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath < >>>>>>>> nick.pentre...@gmail.com> wrote: >>>>>>>> >>>>>>>>> All Scala, Python tests pass. ML QA and doc issues are resolved >>>>>>>>> (as well as R it seems). >>>>>>>>> >>>>>>>>> However, I'm seeing the following test failure on R consistently: >>>>>>>>> https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, 8 Jun 2017 at 08:48 Denny Lee <denny.g....@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> +1 non-binding >>>>>>>>>> >>>>>>>>>> Tested on macOS Sierra, Ubuntu 16.04 >>>>>>>>>> test suite includes various test cases including Spark SQL, ML, >>>>>>>>>> GraphFrames, Structured Streaming >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jun 7, 2017 at 9:40 PM vaquar khan <vaquar.k...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> +1 non-binding >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> vaquar khan >>>>>>>>>>> >>>>>>>>>>> On Jun 7, 2017 4:32 PM, "Ricardo Almeida" < >>>>>>>>>>> ricardo.alme...@actnowib.com> wrote: >>>>>>>>>>> >>>>>>>>>>> +1 (non-binding) >>>>>>>>>>> >>>>>>>>>>> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 >>>>>>>>>>> -Pyarn -Phive -Phive-thriftserver -Pscala-2.11 on >>>>>>>>>>> >>>>>>>>>>> - Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111) >>>>>>>>>>> - macOS 10.12.5 Java 8 (build 1.8.0_131) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 5 June 2017 at 21:14, Michael Armbrust < >>>>>>>>>>> mich...@databricks.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Please vote on releasing the following candidate as Apache >>>>>>>>>>>> Spark version 2.2.0. The vote is open until Thurs, June 8th, >>>>>>>>>>>> 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC >>>>>>>>>>>> votes are cast. >>>>>>>>>>>> >>>>>>>>>>>> [ ] +1 Release this package as Apache Spark 2.2.0 >>>>>>>>>>>> [ ] -1 Do not release this package because ... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> To learn more about Apache Spark, please see >>>>>>>>>>>> http://spark.apache.org/ >>>>>>>>>>>> >>>>>>>>>>>> The tag to be voted on is v2.2.0-rc4 >>>>>>>>>>>> <https://github.com/apache/spark/tree/v2.2.0-rc4> ( >>>>>>>>>>>> 377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e) >>>>>>>>>>>> >>>>>>>>>>>> List of JIRA tickets resolved can be found with this filter >>>>>>>>>>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0> >>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>>> The release files, including signatures, digests, etc. can be >>>>>>>>>>>> found at: >>>>>>>>>>>> >>>>>>>>>>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/ >>>>>>>>>>>> >>>>>>>>>>>> Release artifacts are signed with the following key: >>>>>>>>>>>> https://people.apache.org/keys/committer/pwendell.asc >>>>>>>>>>>> >>>>>>>>>>>> The staging repository for this release can be found at: >>>>>>>>>>>> >>>>>>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1241/ >>>>>>>>>>>> >>>>>>>>>>>> The documentation corresponding to this release can be found at: >>>>>>>>>>>> >>>>>>>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *FAQ* >>>>>>>>>>>> >>>>>>>>>>>> *How can I help test this release?* >>>>>>>>>>>> >>>>>>>>>>>> If you are a Spark user, you can help us test this release by >>>>>>>>>>>> taking an existing Spark workload and running on this release >>>>>>>>>>>> candidate, >>>>>>>>>>>> then reporting any regressions. >>>>>>>>>>>> >>>>>>>>>>>> *What should happen to JIRA tickets still targeting 2.2.0?* >>>>>>>>>>>> >>>>>>>>>>>> Committers should look at those and triage. Extremely important >>>>>>>>>>>> bug fixes, documentation, and API tweaks that impact compatibility >>>>>>>>>>>> should >>>>>>>>>>>> be worked on immediately. Everything else please retarget to 2.3.0 >>>>>>>>>>>> or 2.2.1. >>>>>>>>>>>> >>>>>>>>>>>> *But my bug isn't fixed!??!* >>>>>>>>>>>> >>>>>>>>>>>> In order to make timely releases, we will typically not hold >>>>>>>>>>>> the release unless the bug in question is a regression from 2.1.1. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Joseph Bradley >>>>>>> >>>>>>> Software Engineer - Machine Learning >>>>>>> >>>>>>> Databricks, Inc. >>>>>>> >>>>>>> [image: http://databricks.com] <http://databricks.com/> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >> >