Please vote on releasing the following candidate as Apache Spark
(incubating) version 0.8.1.

The tag to be voted on is v0.8.1-incubating (commit bf23794a):
https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=tag;h=e6ba91b5a7527316202797fc3dce469ff86cf203

The release files, including signatures, digests, etc can be found at:
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-024/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc2-docs/

For information about the contents of this release see:
<attached> draft of release notes
<attached> draft of release credits
https://github.com/apache/incubator-spark/blob/branch-0.8/CHANGES.txt

Please vote on releasing this package as Apache Spark 0.8.1-incubating!

The vote is open until Wednesday, December 11th at 21:00 UTC and
passes if a majority of at least 3 +1 PPMC votes are cast.

[ ] +1 Release this package as Apache Spark 0.8.1-incubating
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.incubator.apache.org/
Michael Armbrust -- build fix

Pierre Borckmans -- typo fix in documentation

Evan Chan -- added `local://` scheme for dependency jars

Ewen Cheslack-Postava -- `add` method for python accumulators, support for 
setting config properties in python

Mosharaf Chowdhury -- optimized broadcast implementation

Frank Dai -- documentation fix

Aaron Davidson -- lead on shuffle file consolidation, lead on h/a mode for 
standalone scheduler, cleaned up representation of block id’s, several small 
improvements and bug fixes

Tathagata Das -- new streaming operators: `transformWith`, `leftInnerJoin`, and 
`rightOuterJoin`, fix for kafka concurrency bug

Ankur Dave -- support for pausing spot clusters on EC2

Harvey Feng -- optimization to JobConf broadcasts, minor fixes, lead on YARN 
2.2 build

Ali Ghodsi -- scheduler support for SIMR, lead on YARN 2.2 build

Thomas Graves -- lead on Spark YARN integration including secure HDFS access 
over YARN

Li Guoqiang -- fix for maven build

Stephen Haberman -- bug fix

Haidar Hadi -- documentation fix

Nathan Howell -- bug fix relating to YARN

Holden Karau -- java version of `mapPartitionsWithIndex`

Du Li -- bug fix in make-distrubion.sh

Xi Lui -- bug fix and code clean-up

David McCauley -- bug fix in standalone mode JSON output

Michael (wannabeast) -- bug fix in memory store

Fabrizio Milo -- typos in documentation, minor clean-up in DAGScheduler, typo 
in scaladoc

Mridul Muralidharan -- fixes to meta-data cleaner and speculative scheduler

Sundeep Narravula -- build fix, bug fixes in scheduler and tests, minor code 
clean-up

Kay Ousterhout -- optimization to task result fetching, extensive code clean-up 
and refactoring (task schedulers, thread pools), result-fetching state in UI, 
showing task and attempt it in UI, several bug fixes in scheduler, UI, and unit 
tests

Nick Pentreath -- implicit feedback variant of ALS algorithm

Imran Rashid -- small improvement to executor launch

Ahir Reddy -- spark support for SIMR

Josh Rosen -- reduced memory overhead for BlockInfo objects, clean up of 
BlockManager code, fix to java API auditor, code clean-up in java API, and bug 
fixes in python API

Henry Saputra -- build fix

Jerry Shao -- refactoring of fair scheduler, support for running spark as a 
specific user, bug fix

Mingfei Shi -- documentation for JobLogger

Andre Schumacher -- sortByKey in pyspark and associated changes

Karthik Tunga -- bug fix in launch script

Patrick Wendell -- added `repartition` operator, logging improvements, 
instrumentation for shuffle write, documentation improvements, fix for 
streaming example, and release management

Neal Wiggins -- minor import clean-up, documentation typo

Andrew Xia -- bug fix in UI

Reynold Xin -- optimized hash set and hash tables for primitive types, task 
killing, support for setting job properties in repl, logging improvements, Kryo 
improvements, several bug fixes, and general clean-up

Matei Zaharia -- optimized hashmap for shuffle data, pyspark documentation, 
optimizations to kryo and chill serializers

Wu Zeming -- bug fix in executors UI
DRAFT OF RELEASE NOTES FOR SPARK 0.8.1

Apache Spark 0.8.1 is a maintenance release including several bug fixes and 
performance optimizations. It also includes a few new features. Contributions 
to 0.8.1 came from 40 developers.

== High availability mode for standalone scheduler ==
The standalone scheduler now has a High Availability (H/A) mode which can 
tolerate master failures. This is particularly useful for long-running 
applications such as streaming jobs and the shark server, where the scheduler 
master previous represented a single point of failure. Instructions for 
deploying H/A mode are included in the documentation. The current 
implementation uses Zookeeper for coordination.

== YARN 2.2 support ==
Support has been added for submitting Spark applications to YARN 2.2 and newer. 
Due to a dependency conflict, this did not work properly in Spark 0.8.0 and 
earlier. See the release documentation for specific instructions on how to 
build Spark for YARN 2.2+.

== Internal Optimizations ==
This release adds several performance optimizations:
  - Append only map for shuffle - an internal hashmap optimized for storing 
shuffle data
  - Efficient encoding for Job confs - improves latency for stages reading 
large numbers of blocks from HDFS, S3, and HBase
  - Shuffle file consolidation (off by default) - reduces the number of files 
created in large shuffles for better filesystem performance. We recommend users 
turn this on unless they are using ext3.
  - Torrent broadcast (off by default) - reduces network overhead and latency 
of broadcasting large objects.
  - Support for fetching large result sets - allows tasks to return large 
results without tuning akka buffer sizes.

== Python improvements == 
  - new `add` method for accumulators
  - it is now possible to set config properties directly from python
  - python now supports sorted RDD’s

== New operators and usability improvements == 
- local:// URI’s - allows users to specify already present on slaves as 
dependencies
- a new “result fetching” state has been added to the UI
- new spark streaming operators: transformWith, leftInnerJoin, rightOuterJoin
- new spark operators: repartition

Reply via email to