Please vote on releasing the following candidate as Apache Spark (incubating) version 0.8.1.
The tag to be voted on is v0.8.1-incubating (commit bf23794a): https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203 The release files, including signatures, digests, etc can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-024/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/ For information about the contents of this release see: <attached> draft of release notes <attached> draft of release credits https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt Please vote on releasing this package as Apache Spark 0.8.1-incubating! The vote is open until Wednesday, December 11th at 21:00 UTC and passes if a majority of at least 3 +1 PPMC votes are cast. [ ] +1 Release this package as Apache Spark 0.8.1-incubating [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.incubator.apache.org/
Michael Armbrust -- build fix Pierre Borckmans -- typo fix in documentation Evan Chan -- added `local://` scheme for dependency jars Ewen Cheslack-Postava -- `add` method for python accumulators, support for setting config properties in python Mosharaf Chowdhury -- optimized broadcast implementation Frank Dai -- documentation fix Aaron Davidson -- lead on shuffle file consolidation, lead on h/a mode for standalone scheduler, cleaned up representation of block id’s, several small improvements and bug fixes Tathagata Das -- new streaming operators: `transformWith`, `leftInnerJoin`, and `rightOuterJoin`, fix for kafka concurrency bug Ankur Dave -- support for pausing spot clusters on EC2 Harvey Feng -- optimization to JobConf broadcasts, minor fixes, lead on YARN 2.2 build Ali Ghodsi -- scheduler support for SIMR, lead on YARN 2.2 build Thomas Graves -- lead on Spark YARN integration including secure HDFS access over YARN Li Guoqiang -- fix for maven build Stephen Haberman -- bug fix Haidar Hadi -- documentation fix Nathan Howell -- bug fix relating to YARN Holden Karau -- java version of `mapPartitionsWithIndex` Du Li -- bug fix in make-distrubion.sh Xi Lui -- bug fix and code clean-up David McCauley -- bug fix in standalone mode JSON output Michael (wannabeast) -- bug fix in memory store Fabrizio Milo -- typos in documentation, minor clean-up in DAGScheduler, typo in scaladoc Mridul Muralidharan -- fixes to meta-data cleaner and speculative scheduler Sundeep Narravula -- build fix, bug fixes in scheduler and tests, minor code clean-up Kay Ousterhout -- optimization to task result fetching, extensive code clean-up and refactoring (task schedulers, thread pools), result-fetching state in UI, showing task and attempt it in UI, several bug fixes in scheduler, UI, and unit tests Nick Pentreath -- implicit feedback variant of ALS algorithm Imran Rashid -- small improvement to executor launch Ahir Reddy -- spark support for SIMR Josh Rosen -- reduced memory overhead for BlockInfo objects, clean up of BlockManager code, fix to java API auditor, code clean-up in java API, and bug fixes in python API Henry Saputra -- build fix Jerry Shao -- refactoring of fair scheduler, support for running spark as a specific user, bug fix Mingfei Shi -- documentation for JobLogger Andre Schumacher -- sortByKey in pyspark and associated changes Karthik Tunga -- bug fix in launch script Patrick Wendell -- added `repartition` operator, logging improvements, instrumentation for shuffle write, documentation improvements, fix for streaming example, and release management Neal Wiggins -- minor import clean-up, documentation typo Andrew Xia -- bug fix in UI Reynold Xin -- optimized hash set and hash tables for primitive types, task killing, support for setting job properties in repl, logging improvements, Kryo improvements, several bug fixes, and general clean-up Matei Zaharia -- optimized hashmap for shuffle data, pyspark documentation, optimizations to kryo and chill serializers Wu Zeming -- bug fix in executors UI
DRAFT OF RELEASE NOTES FOR SPARK 0.8.1 Apache Spark 0.8.1 is a maintenance release including several bug fixes and performance optimizations. It also includes a few new features. Contributions to 0.8.1 came from 40 developers. == High availability mode for standalone scheduler == The standalone scheduler now has a High Availability (H/A) mode which can tolerate master failures. This is particularly useful for long-running applications such as streaming jobs and the shark server, where the scheduler master previous represented a single point of failure. Instructions for deploying H/A mode are included in the documentation. The current implementation uses Zookeeper for coordination. == YARN 2.2 support == Support has been added for submitting Spark applications to YARN 2.2 and newer. Due to a dependency conflict, this did not work properly in Spark 0.8.0 and earlier. See the release documentation for specific instructions on how to build Spark for YARN 2.2+. == Internal Optimizations == This release adds several performance optimizations: - Append only map for shuffle - an internal hashmap optimized for storing shuffle data - Efficient encoding for Job confs - improves latency for stages reading large numbers of blocks from HDFS, S3, and HBase - Shuffle file consolidation (off by default) - reduces the number of files created in large shuffles for better filesystem performance. We recommend users turn this on unless they are using ext3. - Torrent broadcast (off by default) - reduces network overhead and latency of broadcasting large objects. - Support for fetching large result sets - allows tasks to return large results without tuning akka buffer sizes. == Python improvements == - new `add` method for accumulators - it is now possible to set config properties directly from python - python now supports sorted RDD’s == New operators and usability improvements == - local:// URI’s - allows users to specify already present on slaves as dependencies - a new “result fetching” state has been added to the UI - new spark streaming operators: transformWith, leftInnerJoin, rightOuterJoin - new spark operators: repartition