+1 (non-binding) Regards, vaquar khan
On Sun, Dec 18, 2016 at 2:33 PM, Adam Roberts <arobe...@uk.ibm.com> wrote: > +1 (non-binding) > > *Functional*: looks good, tested with OpenJDK 8 (1.8.0_111) and IBM's > latest SDK for Java (8 SR3 FP21). > > Tests run clean on Ubuntu 16 04, 14 04, SUSE 12, CentOS 7.2 on x86 and IBM > specific platforms including big-endian. On slower machines I see these > failing but nothing to be concerned over (timeouts): > > *org.apache.spark.DistributedSuite.caching on disk* > *org.apache.spark.rdd.LocalCheckpointSuite.missing checkpoint block fails > with informative message* > *org.apache.spark.sql.streaming.StreamingAggregationSuite.prune results by > current_time, complete mode* > *org.apache.spark.sql.streaming.StreamingAggregationSuite.prune results by > current_date, complete mode* > *org.apache.spark.sql.hive.HiveSparkSubmitSuite.set > hive.metastore.warehouse.dir* > > *Performance vs 2.0.2:* lots of improvements seen using the HiBench and > SparkSqlPerf benchmarks, tested with a 48 core Intel machine using the Kryo > serializer, controlled test environment. These are all open source > benchmarks anyone can use and experiment with. Elapsed times measured, *+ > scores* are an improvement (so it's that much percent faster) and *- > scores* are used for regressions I'm seeing. > > - K-means: Java API *+22%* (100 sec to 78 sec), Scala API *+30%* (34 > seconds to 24 seconds), Python API unchanged > - PageRank: minor improvement from 40 seconds to 38 seconds, *+5%* > - Sort: minor improvement, 10.8 seconds to 9.8 seconds, *+10%* > - WordCount: unchanged > - Bayes: mixed bag, sometimes much slower (95 sec to 140 sec) which is > *-47%*, other times marginally faster by *15%*, something to keep an > eye on > - Terasort: *+18%* (39 seconds to 32 seconds) with the Java/Scala APIs > > > For TPC-DS SQL queries the results are a mixed bag again, I see > 10% > boosts for q9, q68, q75, q96 and > 10% slowdowns for q7, q39a, q43, q52, > q57, q89. Five iterations, average times compared, only changing which > version of Spark we're using > > > > From: Holden Karau <hol...@pigscanfly.ca> > To: Denny Lee <denny.g....@gmail.com>, Liwei Lin <lwl...@gmail.com>, > "dev@spark.apache.org" <dev@spark.apache.org> > Date: 18/12/2016 20:05 > Subject: Re: [VOTE] Apache Spark 2.1.0 (RC5) > ------------------------------ > > > > +1 (non-binding) - checked Python artifacts with virtual env. > > On Sun, Dec 18, 2016 at 11:42 AM Denny Lee <*denny.g....@gmail.com* > <denny.g....@gmail.com>> wrote: > +1 (non-binding) > > > On Sat, Dec 17, 2016 at 11:45 PM Liwei Lin <*lwl...@gmail.com* > <lwl...@gmail.com>> wrote: > +1 > > Cheers, > Liwei > > > > On Sat, Dec 17, 2016 at 10:29 AM, Yuming Wang <*wgy...@gmail.com* > <wgy...@gmail.com>> wrote: > I hope *https://github.com/apache/spark/pull/16252* > <https://github.com/apache/spark/pull/16252> can be fixed until release > 2.1.0. It's a fix for broadcast cannot fit in memory. > > On Sat, Dec 17, 2016 at 10:23 AM, Joseph Bradley <*jos...@databricks.com* > <jos...@databricks.com>> wrote: > +1 > > On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier < > *hvanhov...@databricks.com* <hvanhov...@databricks.com>> wrote: > +1 > > On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li <*gatorsm...@gmail.com* > <gatorsm...@gmail.com>> wrote: > +1 > > Xiao Li > > 2016-12-16 12:19 GMT-08:00 Felix Cheung <*felixcheun...@hotmail.com* > <felixcheun...@hotmail.com>>: > > > > > > > > > > > > > For R we have a license field in the DESCRIPTION, and this is standard > practice (and requirement) for R packages. > > > > > > > > *https://cran.r-project.org/doc/manuals/R-exts.html#Licensing* > <https://cran.r-project.org/doc/manuals/R-exts.html#Licensing> > > > > > > > > ------------------------------ > > > *From:* Sean Owen <*so...@cloudera.com* <so...@cloudera.com>> > > > * Sent:* Friday, December 16, 2016 9:57:15 AM > > > * To:* Reynold Xin; *dev@spark.apache.org* <dev@spark.apache.org> > > > * Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5) > > > > > > > > > > > (If you have a template for these emails, maybe update it to use https > links. They work for > > *apache.org* <http://apache.org/> domains. After all we are asking people > to verify the integrity of release artifacts, so it might as well be > secure.) > > > > > > > > (Also the new archives use .tar.gz instead of .tgz like the others. No big > deal, my OCD eye just noticed it.) > > > > > > > > I don't see an Apache license / notice for the Pyspark or SparkR > artifacts. It would be good practice to include this in a convenience > binary. I'm not sure if it's strictly mandatory, but something to adjust in > any event. I think that's all there is to > > do for SparkR. For Pyspark, which packages a bunch of dependencies, it > does include the licenses (good) but I think it should include the NOTICE > file. > > > > > > > > This is the first time I recall getting 0 test failures off the bat! > > > I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles. > > > > > > > > I think I'd +1 this therefore unless someone knows that the license issue > above is real and a blocker. > > > > > > > > On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <*r...@databricks.com* > <r...@databricks.com>> wrote: > > > > > > > > > Please vote on releasing the following candidate as Apache Spark version > 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes > if a majority of at least 3 +1 PMC votes are cast. > > > > > > > > [ ] +1 Release this package as Apache Spark 2.1.0 > > > [ ] -1 Do not release this package because ... > > > > > > > > > > > > > To learn more about Apache Spark, please see > > * http://spark.apache.org/* <http://spark.apache.org/> > > > > > > > > The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116b > f56fa62c76) > > > > > > > > List of JIRA tickets resolved are: > *https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0* > <https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0> > > > > > > > > The release files, including signatures, digests, etc. can be found at: > > > *http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/* > <http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/> > > > > > > > > Release artifacts are signed with the following key: > > > *https://people.apache.org/keys/committer/pwendell.asc* > <https://people.apache.org/keys/committer/pwendell.asc> > > > > > > > > The staging repository for this release can be found at: > > > *https://repository.apache.org/content/repositories/orgapachespark-1223/* > <https://repository.apache.org/content/repositories/orgapachespark-1223/> > > > > > > > > The documentation corresponding to this release can be found at: > > > *http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/* > <http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/> > > > > > > > > > > > > > *FAQ* > > > > > > > > *How can I help test this release?* > > > > > > > > If you are a Spark user, you can help us test this release by taking an > existing Spark workload and running on this release candidate, then > reporting any regressions. > > > > > > > > *What should happen to JIRA tickets still targeting 2.1.0?* > > > > > > > > Committers should look at those and triage. Extremely important bug fixes, > documentation, and API tweaks that impact compatibility should be worked on > immediately. Everything else please retarget to 2.1.1 or 2.2.0. > > > > > > > > *What happened to RC3/RC5?* > > > > > > > > They had issues withe release packaging and as a result were skipped. > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Herman van Hövell > Software Engineer > Databricks Inc. > *hvanhov...@databricks.com* <hvanhov...@databricks.com> > +31 6 420 590 27 > *databricks.com* <http://databricks.com/> > <http://databricks.com/> > > > > > > > > > -- > Joseph Bradley > Software Engineer - Machine Learning > Databricks, Inc. > <http://databricks.com/> > > > > > > > > > > > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > > -- Regards, Vaquar Khan +1 -224-436-0783 IT Architect / Lead Consultant Greater Chicago