+1 On Tue, Dec 22, 2015 at 8:10 PM, Denny Lee <denny.g....@gmail.com> wrote:
> +1 > > On Tue, Dec 22, 2015 at 7:05 PM Aaron Davidson <ilike...@gmail.com> wrote: > >> +1 >> >> On Tue, Dec 22, 2015 at 7:01 PM, Josh Rosen <joshro...@databricks.com> >> wrote: >> >>> +1 >>> >>> On Tue, Dec 22, 2015 at 7:00 PM, Jeff Zhang <zjf...@gmail.com> wrote: >>> >>>> +1 >>>> >>>> On Wed, Dec 23, 2015 at 7:36 AM, Mark Hamstra <m...@clearstorydata.com> >>>> wrote: >>>> >>>>> +1 >>>>> >>>>> On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust < >>>>> mich...@databricks.com> wrote: >>>>> >>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>> version 1.6.0! >>>>>> >>>>>> The vote is open until Friday, December 25, 2015 at 18:00 UTC and >>>>>> passes if a majority of at least 3 +1 PMC votes are cast. >>>>>> >>>>>> [ ] +1 Release this package as Apache Spark 1.6.0 >>>>>> [ ] -1 Do not release this package because ... >>>>>> >>>>>> To learn more about Apache Spark, please see http://spark.apache.org/ >>>>>> >>>>>> The tag to be voted on is *v1.6.0-rc4 >>>>>> (4062cda3087ae42c6c3cb24508fc1d3a931accdf) >>>>>> <https://github.com/apache/spark/tree/v1.6.0-rc4>* >>>>>> >>>>>> The release files, including signatures, digests, etc. can be found >>>>>> at: >>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/ >>>>>> >>>>>> Release artifacts are signed with the following key: >>>>>> https://people.apache.org/keys/committer/pwendell.asc >>>>>> >>>>>> The staging repository for this release can be found at: >>>>>> >>>>>> https://repository.apache.org/content/repositories/orgapachespark-1176/ >>>>>> >>>>>> The test repository (versioned as v1.6.0-rc4) for this release can be >>>>>> found at: >>>>>> >>>>>> https://repository.apache.org/content/repositories/orgapachespark-1175/ >>>>>> >>>>>> The documentation corresponding to this release can be found at: >>>>>> >>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/ >>>>>> >>>>>> ======================================= >>>>>> == How can I help test this release? == >>>>>> ======================================= >>>>>> If you are a Spark user, you can help us test this release by taking >>>>>> an existing Spark workload and running on this release candidate, then >>>>>> reporting any regressions. >>>>>> >>>>>> ================================================ >>>>>> == What justifies a -1 vote for this release? == >>>>>> ================================================ >>>>>> This vote is happening towards the end of the 1.6 QA period, so -1 >>>>>> votes should only occur for significant regressions from 1.5. Bugs >>>>>> already >>>>>> present in 1.5, minor regressions, or bugs related to new features will >>>>>> not >>>>>> block this release. >>>>>> >>>>>> =============================================================== >>>>>> == What should happen to JIRA tickets still targeting 1.6.0? == >>>>>> =============================================================== >>>>>> 1. It is OK for documentation patches to target 1.6.0 and still go >>>>>> into branch-1.6, since documentations will be published separately from >>>>>> the >>>>>> release. >>>>>> 2. New features for non-alpha-modules should target 1.7+. >>>>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the >>>>>> target version. >>>>>> >>>>>> >>>>>> ================================================== >>>>>> == Major changes to help you focus your testing == >>>>>> ================================================== >>>>>> >>>>>> Notable changes since 1.6 RC3 >>>>>> >>>>>> - SPARK-12404 - Fix serialization error for Datasets with >>>>>> Timestamps/Arrays/Decimal >>>>>> - SPARK-12218 - Fix incorrect pushdown of filters to parquet >>>>>> - SPARK-12395 - Fix join columns of outer join for DataFrame using >>>>>> - SPARK-12413 - Fix mesos HA >>>>>> >>>>>> Notable changes since 1.6 RC2 >>>>>> - SPARK_VERSION has been set correctly >>>>>> - SPARK-12199 ML Docs are publishing correctly >>>>>> - SPARK-12345 Mesos cluster mode has been fixed >>>>>> >>>>>> Notable changes since 1.6 RC1 >>>>>> Spark Streaming >>>>>> >>>>>> - SPARK-2629 <https://issues.apache.org/jira/browse/SPARK-2629> >>>>>> trackStateByKey has been renamed to mapWithState >>>>>> >>>>>> Spark SQL >>>>>> >>>>>> - SPARK-12165 <https://issues.apache.org/jira/browse/SPARK-12165> >>>>>> SPARK-12189 <https://issues.apache.org/jira/browse/SPARK-12189> Fix >>>>>> bugs in eviction of storage memory by execution. >>>>>> - SPARK-12258 <https://issues.apache.org/jira/browse/SPARK-12258> >>>>>> correct >>>>>> passing null into ScalaUDF >>>>>> >>>>>> Notable Features Since 1.5Spark SQL >>>>>> >>>>>> - SPARK-11787 <https://issues.apache.org/jira/browse/SPARK-11787> >>>>>> Parquet >>>>>> Performance - Improve Parquet scan performance when using flat >>>>>> schemas. >>>>>> - SPARK-10810 <https://issues.apache.org/jira/browse/SPARK-10810> >>>>>> Session Management - Isolated devault database (i.e USE mydb) >>>>>> even on shared clusters. >>>>>> - SPARK-9999 <https://issues.apache.org/jira/browse/SPARK-9999> >>>>>> Dataset >>>>>> API - A type-safe API (similar to RDDs) that performs many >>>>>> operations on serialized binary data and code generation (i.e. Project >>>>>> Tungsten). >>>>>> - SPARK-10000 <https://issues.apache.org/jira/browse/SPARK-10000> >>>>>> Unified >>>>>> Memory Management - Shared memory for execution and caching >>>>>> instead of exclusive division of the regions. >>>>>> - SPARK-11197 <https://issues.apache.org/jira/browse/SPARK-11197> SQL >>>>>> Queries on Files - Concise syntax for running SQL queries over >>>>>> files of any supported format without registering a table. >>>>>> - SPARK-11745 <https://issues.apache.org/jira/browse/SPARK-11745> >>>>>> Reading >>>>>> non-standard JSON files - Added options to read non-standard JSON >>>>>> files (e.g. single-quotes, unquoted attributes) >>>>>> - SPARK-10412 <https://issues.apache.org/jira/browse/SPARK-10412> >>>>>> Per-operator >>>>>> Metrics for SQL Execution - Display statistics on a peroperator >>>>>> basis for memory usage and spilled data size. >>>>>> - SPARK-11329 <https://issues.apache.org/jira/browse/SPARK-11329> Star >>>>>> (*) expansion for StructTypes - Makes it easier to nest and unest >>>>>> arbitrary numbers of columns >>>>>> - SPARK-10917 <https://issues.apache.org/jira/browse/SPARK-10917> >>>>>> , SPARK-11149 <https://issues.apache.org/jira/browse/SPARK-11149> >>>>>> In-memory >>>>>> Columnar Cache Performance - Significant (up to 14x) speed up >>>>>> when caching data that contains complex types in DataFrames or SQL. >>>>>> - SPARK-11111 <https://issues.apache.org/jira/browse/SPARK-11111> Fast >>>>>> null-safe joins - Joins using null-safe equality (<=>) will now >>>>>> execute using SortMergeJoin instead of computing a cartisian product. >>>>>> - SPARK-11389 <https://issues.apache.org/jira/browse/SPARK-11389> SQL >>>>>> Execution Using Off-Heap Memory - Support for configuring query >>>>>> execution to occur using off-heap memory to avoid GC overhead >>>>>> - SPARK-10978 <https://issues.apache.org/jira/browse/SPARK-10978> >>>>>> Datasource >>>>>> API Avoid Double Filter - When implemeting a datasource with >>>>>> filter pushdown, developers can now tell Spark SQL to avoid double >>>>>> evaluating a pushed-down filter. >>>>>> - SPARK-4849 <https://issues.apache.org/jira/browse/SPARK-4849> >>>>>> Advanced >>>>>> Layout of Cached Data - storing partitioning and ordering schemes >>>>>> in In-memory table scan, and adding distributeBy and localSort to DF >>>>>> API >>>>>> - SPARK-9858 <https://issues.apache.org/jira/browse/SPARK-9858> >>>>>> Adaptive >>>>>> query execution - Intial support for automatically selecting the >>>>>> number of reducers for joins and aggregations. >>>>>> - SPARK-9241 <https://issues.apache.org/jira/browse/SPARK-9241> >>>>>> Improved >>>>>> query planner for queries having distinct aggregations - Query >>>>>> plans of distinct aggregations are more robust when distinct columns >>>>>> have >>>>>> high cardinality. >>>>>> >>>>>> Spark Streaming >>>>>> >>>>>> - API Updates >>>>>> - SPARK-2629 >>>>>> <https://issues.apache.org/jira/browse/SPARK-2629> New >>>>>> improved state management - mapWithState - a DStream >>>>>> transformation for stateful stream processing, supercedes >>>>>> updateStateByKey in functionality and performance. >>>>>> - SPARK-11198 >>>>>> <https://issues.apache.org/jira/browse/SPARK-11198> Kinesis >>>>>> record deaggregation - Kinesis streams have been upgraded to >>>>>> use KCL 1.4.0 and supports transparent deaggregation of >>>>>> KPL-aggregated >>>>>> records. >>>>>> - SPARK-10891 >>>>>> <https://issues.apache.org/jira/browse/SPARK-10891> Kinesis >>>>>> message handler function - Allows arbitraray function to be >>>>>> applied to a Kinesis record in the Kinesis receiver before to >>>>>> customize >>>>>> what data is to be stored in memory. >>>>>> - SPARK-6328 >>>>>> <https://issues.apache.org/jira/browse/SPARK-6328> Python >>>>>> Streamng Listener API - Get streaming statistics (scheduling >>>>>> delays, batch processing times, etc.) in streaming. >>>>>> >>>>>> >>>>>> - UI Improvements >>>>>> - Made failures visible in the streaming tab, in the >>>>>> timelines, batch list, and batch details page. >>>>>> - Made output operations visible in the streaming tab as >>>>>> progress bars. >>>>>> >>>>>> MLlibNew algorithms/models >>>>>> >>>>>> - SPARK-8518 <https://issues.apache.org/jira/browse/SPARK-8518> >>>>>> Survival >>>>>> analysis - Log-linear model for survival analysis >>>>>> - SPARK-9834 <https://issues.apache.org/jira/browse/SPARK-9834> >>>>>> Normal >>>>>> equation for least squares - Normal equation solver, providing >>>>>> R-like model summary statistics >>>>>> - SPARK-3147 <https://issues.apache.org/jira/browse/SPARK-3147> >>>>>> Online >>>>>> hypothesis testing - A/B testing in the Spark Streaming framework >>>>>> - SPARK-9930 <https://issues.apache.org/jira/browse/SPARK-9930> New >>>>>> feature transformers - ChiSqSelector, QuantileDiscretizer, SQL >>>>>> transformer >>>>>> - SPARK-6517 <https://issues.apache.org/jira/browse/SPARK-6517> >>>>>> Bisecting >>>>>> K-Means clustering - Fast top-down clustering variant of K-Means >>>>>> >>>>>> API improvements >>>>>> >>>>>> - ML Pipelines >>>>>> - SPARK-6725 >>>>>> <https://issues.apache.org/jira/browse/SPARK-6725> Pipeline >>>>>> persistence - Save/load for ML Pipelines, with partial >>>>>> coverage of spark.mlalgorithms >>>>>> - SPARK-5565 >>>>>> <https://issues.apache.org/jira/browse/SPARK-5565> LDA in ML >>>>>> Pipelines - API for Latent Dirichlet Allocation in ML Pipelines >>>>>> - R API >>>>>> - SPARK-9836 >>>>>> <https://issues.apache.org/jira/browse/SPARK-9836> R-like >>>>>> statistics for GLMs - (Partial) R-like stats for ordinary >>>>>> least squares via summary(model) >>>>>> - SPARK-9681 >>>>>> <https://issues.apache.org/jira/browse/SPARK-9681> Feature >>>>>> interactions in R formula - Interaction operator ":" in R >>>>>> formula >>>>>> - Python API - Many improvements to Python API to approach >>>>>> feature parity >>>>>> >>>>>> Misc improvements >>>>>> >>>>>> - SPARK-7685 <https://issues.apache.org/jira/browse/SPARK-7685>, >>>>>> SPARK-9642 <https://issues.apache.org/jira/browse/SPARK-9642> >>>>>> Instance >>>>>> weights for GLMs - Logistic and Linear Regression can take >>>>>> instance weights >>>>>> - SPARK-10384 <https://issues.apache.org/jira/browse/SPARK-10384> >>>>>> , SPARK-10385 <https://issues.apache.org/jira/browse/SPARK-10385> >>>>>> Univariate >>>>>> and bivariate statistics in DataFrames - Variance, stddev, >>>>>> correlations, etc. >>>>>> - SPARK-10117 <https://issues.apache.org/jira/browse/SPARK-10117> >>>>>> LIBSVM >>>>>> data source - LIBSVM as a SQL data sourceDocumentation >>>>>> improvements >>>>>> - SPARK-7751 <https://issues.apache.org/jira/browse/SPARK-7751> >>>>>> @since >>>>>> versions - Documentation includes initial version when classes >>>>>> and methods were added >>>>>> - SPARK-11337 <https://issues.apache.org/jira/browse/SPARK-11337> >>>>>> Testable >>>>>> example code - Automated testing for code in user guide examples >>>>>> >>>>>> Deprecations >>>>>> >>>>>> - In spark.mllib.clustering.KMeans, the "runs" parameter has been >>>>>> deprecated. >>>>>> - In spark.ml.classification.LogisticRegressionModel and >>>>>> spark.ml.regression.LinearRegressionModel, the "weights" field has >>>>>> been >>>>>> deprecated, in favor of the new name "coefficients." This helps >>>>>> disambiguate from instance (row) weights given to algorithms. >>>>>> >>>>>> Changes of behavior >>>>>> >>>>>> - spark.mllib.tree.GradientBoostedTrees validationTol has changed >>>>>> semantics in 1.6. Previously, it was a threshold for absolute change >>>>>> in >>>>>> error. Now, it resembles the behavior of GradientDescent >>>>>> convergenceTol: >>>>>> For large errors, it uses relative error (relative to the previous >>>>>> error); >>>>>> for small errors (< 0.01), it uses absolute error. >>>>>> - spark.ml.feature.RegexTokenizer: Previously, it did not convert >>>>>> strings to lowercase before tokenizing. Now, it converts to lowercase >>>>>> by >>>>>> default, with an option not to. This matches the behavior of the >>>>>> simpler >>>>>> Tokenizer transformer. >>>>>> - Spark SQL's partition discovery has been changed to only >>>>>> discover partition directories that are children of the given path. >>>>>> (i.e. >>>>>> if path="/my/data/x=1" then x=1 will no longer be considered a >>>>>> partition but only children of x=1.) This behavior can be >>>>>> overridden by manually specifying the basePath that partitioning >>>>>> discovery should start with (SPARK-11678 >>>>>> <https://issues.apache.org/jira/browse/SPARK-11678>). >>>>>> - When casting a value of an integral type to timestamp (e.g. >>>>>> casting a long value to timestamp), the value is treated as being in >>>>>> seconds instead of milliseconds (SPARK-11724 >>>>>> <https://issues.apache.org/jira/browse/SPARK-11724>). >>>>>> - With the improved query planner for queries having distinct >>>>>> aggregations (SPARK-9241 >>>>>> <https://issues.apache.org/jira/browse/SPARK-9241>), the plan of >>>>>> a query having a single distinct aggregation has been changed to a >>>>>> more >>>>>> robust version. To switch back to the plan generated by Spark 1.5's >>>>>> planner, please set spark.sql.specializeSingleDistinctAggPlanning >>>>>> to true (SPARK-12077 >>>>>> <https://issues.apache.org/jira/browse/SPARK-12077>). >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >>> >>> >>