+1 On Tue, Dec 22, 2015 at 7:05 PM Aaron Davidson <ilike...@gmail.com> wrote:
> +1 > > On Tue, Dec 22, 2015 at 7:01 PM, Josh Rosen <joshro...@databricks.com> > wrote: > >> +1 >> >> On Tue, Dec 22, 2015 at 7:00 PM, Jeff Zhang <zjf...@gmail.com> wrote: >> >>> +1 >>> >>> On Wed, Dec 23, 2015 at 7:36 AM, Mark Hamstra <m...@clearstorydata.com> >>> wrote: >>> >>>> +1 >>>> >>>> On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust < >>>> mich...@databricks.com> wrote: >>>> >>>>> Please vote on releasing the following candidate as Apache Spark >>>>> version 1.6.0! >>>>> >>>>> The vote is open until Friday, December 25, 2015 at 18:00 UTC and >>>>> passes if a majority of at least 3 +1 PMC votes are cast. >>>>> >>>>> [ ] +1 Release this package as Apache Spark 1.6.0 >>>>> [ ] -1 Do not release this package because ... >>>>> >>>>> To learn more about Apache Spark, please see http://spark.apache.org/ >>>>> >>>>> The tag to be voted on is *v1.6.0-rc4 >>>>> (4062cda3087ae42c6c3cb24508fc1d3a931accdf) >>>>> <https://github.com/apache/spark/tree/v1.6.0-rc4>* >>>>> >>>>> The release files, including signatures, digests, etc. can be found at: >>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/ >>>>> >>>>> Release artifacts are signed with the following key: >>>>> https://people.apache.org/keys/committer/pwendell.asc >>>>> >>>>> The staging repository for this release can be found at: >>>>> https://repository.apache.org/content/repositories/orgapachespark-1176/ >>>>> >>>>> The test repository (versioned as v1.6.0-rc4) for this release can be >>>>> found at: >>>>> https://repository.apache.org/content/repositories/orgapachespark-1175/ >>>>> >>>>> The documentation corresponding to this release can be found at: >>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/ >>>>> >>>>> ======================================= >>>>> == How can I help test this release? == >>>>> ======================================= >>>>> If you are a Spark user, you can help us test this release by taking >>>>> an existing Spark workload and running on this release candidate, then >>>>> reporting any regressions. >>>>> >>>>> ================================================ >>>>> == What justifies a -1 vote for this release? == >>>>> ================================================ >>>>> This vote is happening towards the end of the 1.6 QA period, so -1 >>>>> votes should only occur for significant regressions from 1.5. Bugs already >>>>> present in 1.5, minor regressions, or bugs related to new features will >>>>> not >>>>> block this release. >>>>> >>>>> =============================================================== >>>>> == What should happen to JIRA tickets still targeting 1.6.0? == >>>>> =============================================================== >>>>> 1. It is OK for documentation patches to target 1.6.0 and still go >>>>> into branch-1.6, since documentations will be published separately from >>>>> the >>>>> release. >>>>> 2. New features for non-alpha-modules should target 1.7+. >>>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the >>>>> target version. >>>>> >>>>> >>>>> ================================================== >>>>> == Major changes to help you focus your testing == >>>>> ================================================== >>>>> >>>>> Notable changes since 1.6 RC3 >>>>> >>>>> - SPARK-12404 - Fix serialization error for Datasets with >>>>> Timestamps/Arrays/Decimal >>>>> - SPARK-12218 - Fix incorrect pushdown of filters to parquet >>>>> - SPARK-12395 - Fix join columns of outer join for DataFrame using >>>>> - SPARK-12413 - Fix mesos HA >>>>> >>>>> Notable changes since 1.6 RC2 >>>>> - SPARK_VERSION has been set correctly >>>>> - SPARK-12199 ML Docs are publishing correctly >>>>> - SPARK-12345 Mesos cluster mode has been fixed >>>>> >>>>> Notable changes since 1.6 RC1 >>>>> Spark Streaming >>>>> >>>>> - SPARK-2629 <https://issues.apache.org/jira/browse/SPARK-2629> >>>>> trackStateByKey has been renamed to mapWithState >>>>> >>>>> Spark SQL >>>>> >>>>> - SPARK-12165 <https://issues.apache.org/jira/browse/SPARK-12165> >>>>> SPARK-12189 <https://issues.apache.org/jira/browse/SPARK-12189> Fix >>>>> bugs in eviction of storage memory by execution. >>>>> - SPARK-12258 <https://issues.apache.org/jira/browse/SPARK-12258> >>>>> correct >>>>> passing null into ScalaUDF >>>>> >>>>> Notable Features Since 1.5Spark SQL >>>>> >>>>> - SPARK-11787 <https://issues.apache.org/jira/browse/SPARK-11787> >>>>> Parquet >>>>> Performance - Improve Parquet scan performance when using flat >>>>> schemas. >>>>> - SPARK-10810 <https://issues.apache.org/jira/browse/SPARK-10810> >>>>> Session Management - Isolated devault database (i.e USE mydb) even >>>>> on shared clusters. >>>>> - SPARK-9999 <https://issues.apache.org/jira/browse/SPARK-9999> >>>>> Dataset >>>>> API - A type-safe API (similar to RDDs) that performs many >>>>> operations on serialized binary data and code generation (i.e. Project >>>>> Tungsten). >>>>> - SPARK-10000 <https://issues.apache.org/jira/browse/SPARK-10000> >>>>> Unified >>>>> Memory Management - Shared memory for execution and caching >>>>> instead of exclusive division of the regions. >>>>> - SPARK-11197 <https://issues.apache.org/jira/browse/SPARK-11197> SQL >>>>> Queries on Files - Concise syntax for running SQL queries over >>>>> files of any supported format without registering a table. >>>>> - SPARK-11745 <https://issues.apache.org/jira/browse/SPARK-11745> >>>>> Reading >>>>> non-standard JSON files - Added options to read non-standard JSON >>>>> files (e.g. single-quotes, unquoted attributes) >>>>> - SPARK-10412 <https://issues.apache.org/jira/browse/SPARK-10412> >>>>> Per-operator >>>>> Metrics for SQL Execution - Display statistics on a peroperator >>>>> basis for memory usage and spilled data size. >>>>> - SPARK-11329 <https://issues.apache.org/jira/browse/SPARK-11329> Star >>>>> (*) expansion for StructTypes - Makes it easier to nest and unest >>>>> arbitrary numbers of columns >>>>> - SPARK-10917 <https://issues.apache.org/jira/browse/SPARK-10917>, >>>>> SPARK-11149 <https://issues.apache.org/jira/browse/SPARK-11149> >>>>> In-memory >>>>> Columnar Cache Performance - Significant (up to 14x) speed up when >>>>> caching data that contains complex types in DataFrames or SQL. >>>>> - SPARK-11111 <https://issues.apache.org/jira/browse/SPARK-11111> Fast >>>>> null-safe joins - Joins using null-safe equality (<=>) will now >>>>> execute using SortMergeJoin instead of computing a cartisian product. >>>>> - SPARK-11389 <https://issues.apache.org/jira/browse/SPARK-11389> SQL >>>>> Execution Using Off-Heap Memory - Support for configuring query >>>>> execution to occur using off-heap memory to avoid GC overhead >>>>> - SPARK-10978 <https://issues.apache.org/jira/browse/SPARK-10978> >>>>> Datasource >>>>> API Avoid Double Filter - When implemeting a datasource with >>>>> filter pushdown, developers can now tell Spark SQL to avoid double >>>>> evaluating a pushed-down filter. >>>>> - SPARK-4849 <https://issues.apache.org/jira/browse/SPARK-4849> >>>>> Advanced >>>>> Layout of Cached Data - storing partitioning and ordering schemes >>>>> in In-memory table scan, and adding distributeBy and localSort to DF >>>>> API >>>>> - SPARK-9858 <https://issues.apache.org/jira/browse/SPARK-9858> >>>>> Adaptive >>>>> query execution - Intial support for automatically selecting the >>>>> number of reducers for joins and aggregations. >>>>> - SPARK-9241 <https://issues.apache.org/jira/browse/SPARK-9241> >>>>> Improved >>>>> query planner for queries having distinct aggregations - Query >>>>> plans of distinct aggregations are more robust when distinct columns >>>>> have >>>>> high cardinality. >>>>> >>>>> Spark Streaming >>>>> >>>>> - API Updates >>>>> - SPARK-2629 <https://issues.apache.org/jira/browse/SPARK-2629> >>>>> New improved state management - mapWithState - a DStream >>>>> transformation for stateful stream processing, supercedes >>>>> updateStateByKey in functionality and performance. >>>>> - SPARK-11198 >>>>> <https://issues.apache.org/jira/browse/SPARK-11198> Kinesis >>>>> record deaggregation - Kinesis streams have been upgraded to >>>>> use KCL 1.4.0 and supports transparent deaggregation of >>>>> KPL-aggregated >>>>> records. >>>>> - SPARK-10891 >>>>> <https://issues.apache.org/jira/browse/SPARK-10891> Kinesis >>>>> message handler function - Allows arbitraray function to be >>>>> applied to a Kinesis record in the Kinesis receiver before to >>>>> customize >>>>> what data is to be stored in memory. >>>>> - SPARK-6328 <https://issues.apache.org/jira/browse/SPARK-6328> >>>>> Python Streamng Listener API - Get streaming statistics >>>>> (scheduling delays, batch processing times, etc.) in streaming. >>>>> >>>>> >>>>> - UI Improvements >>>>> - Made failures visible in the streaming tab, in the timelines, >>>>> batch list, and batch details page. >>>>> - Made output operations visible in the streaming tab as >>>>> progress bars. >>>>> >>>>> MLlibNew algorithms/models >>>>> >>>>> - SPARK-8518 <https://issues.apache.org/jira/browse/SPARK-8518> >>>>> Survival >>>>> analysis - Log-linear model for survival analysis >>>>> - SPARK-9834 <https://issues.apache.org/jira/browse/SPARK-9834> Normal >>>>> equation for least squares - Normal equation solver, providing >>>>> R-like model summary statistics >>>>> - SPARK-3147 <https://issues.apache.org/jira/browse/SPARK-3147> Online >>>>> hypothesis testing - A/B testing in the Spark Streaming framework >>>>> - SPARK-9930 <https://issues.apache.org/jira/browse/SPARK-9930> New >>>>> feature transformers - ChiSqSelector, QuantileDiscretizer, SQL >>>>> transformer >>>>> - SPARK-6517 <https://issues.apache.org/jira/browse/SPARK-6517> >>>>> Bisecting >>>>> K-Means clustering - Fast top-down clustering variant of K-Means >>>>> >>>>> API improvements >>>>> >>>>> - ML Pipelines >>>>> - SPARK-6725 <https://issues.apache.org/jira/browse/SPARK-6725> >>>>> Pipeline persistence - Save/load for ML Pipelines, with >>>>> partial coverage of spark.mlalgorithms >>>>> - SPARK-5565 <https://issues.apache.org/jira/browse/SPARK-5565> >>>>> LDA in ML Pipelines - API for Latent Dirichlet Allocation in >>>>> ML Pipelines >>>>> - R API >>>>> - SPARK-9836 <https://issues.apache.org/jira/browse/SPARK-9836> >>>>> R-like statistics for GLMs - (Partial) R-like stats for >>>>> ordinary least squares via summary(model) >>>>> - SPARK-9681 <https://issues.apache.org/jira/browse/SPARK-9681> >>>>> Feature interactions in R formula - Interaction operator ":" >>>>> in R formula >>>>> - Python API - Many improvements to Python API to approach feature >>>>> parity >>>>> >>>>> Misc improvements >>>>> >>>>> - SPARK-7685 <https://issues.apache.org/jira/browse/SPARK-7685>, >>>>> SPARK-9642 <https://issues.apache.org/jira/browse/SPARK-9642> Instance >>>>> weights for GLMs - Logistic and Linear Regression can take >>>>> instance weights >>>>> - SPARK-10384 <https://issues.apache.org/jira/browse/SPARK-10384>, >>>>> SPARK-10385 <https://issues.apache.org/jira/browse/SPARK-10385> >>>>> Univariate >>>>> and bivariate statistics in DataFrames - Variance, stddev, >>>>> correlations, etc. >>>>> - SPARK-10117 <https://issues.apache.org/jira/browse/SPARK-10117> >>>>> LIBSVM >>>>> data source - LIBSVM as a SQL data sourceDocumentation improvements >>>>> - SPARK-7751 <https://issues.apache.org/jira/browse/SPARK-7751> @since >>>>> versions - Documentation includes initial version when classes and >>>>> methods were added >>>>> - SPARK-11337 <https://issues.apache.org/jira/browse/SPARK-11337> >>>>> Testable >>>>> example code - Automated testing for code in user guide examples >>>>> >>>>> Deprecations >>>>> >>>>> - In spark.mllib.clustering.KMeans, the "runs" parameter has been >>>>> deprecated. >>>>> - In spark.ml.classification.LogisticRegressionModel and >>>>> spark.ml.regression.LinearRegressionModel, the "weights" field has been >>>>> deprecated, in favor of the new name "coefficients." This helps >>>>> disambiguate from instance (row) weights given to algorithms. >>>>> >>>>> Changes of behavior >>>>> >>>>> - spark.mllib.tree.GradientBoostedTrees validationTol has changed >>>>> semantics in 1.6. Previously, it was a threshold for absolute change in >>>>> error. Now, it resembles the behavior of GradientDescent >>>>> convergenceTol: >>>>> For large errors, it uses relative error (relative to the previous >>>>> error); >>>>> for small errors (< 0.01), it uses absolute error. >>>>> - spark.ml.feature.RegexTokenizer: Previously, it did not convert >>>>> strings to lowercase before tokenizing. Now, it converts to lowercase >>>>> by >>>>> default, with an option not to. This matches the behavior of the >>>>> simpler >>>>> Tokenizer transformer. >>>>> - Spark SQL's partition discovery has been changed to only >>>>> discover partition directories that are children of the given path. >>>>> (i.e. >>>>> if path="/my/data/x=1" then x=1 will no longer be considered a >>>>> partition but only children of x=1.) This behavior can be >>>>> overridden by manually specifying the basePath that partitioning >>>>> discovery should start with (SPARK-11678 >>>>> <https://issues.apache.org/jira/browse/SPARK-11678>). >>>>> - When casting a value of an integral type to timestamp (e.g. >>>>> casting a long value to timestamp), the value is treated as being in >>>>> seconds instead of milliseconds (SPARK-11724 >>>>> <https://issues.apache.org/jira/browse/SPARK-11724>). >>>>> - With the improved query planner for queries having distinct >>>>> aggregations (SPARK-9241 >>>>> <https://issues.apache.org/jira/browse/SPARK-9241>), the plan of a >>>>> query having a single distinct aggregation has been changed to a more >>>>> robust version. To switch back to the plan generated by Spark 1.5's >>>>> planner, please set spark.sql.specializeSingleDistinctAggPlanning >>>>> to true (SPARK-12077 >>>>> <https://issues.apache.org/jira/browse/SPARK-12077>). >>>>> >>>>> >>>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> >