Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Nicholas Chammas Wed, 02 Dec 2015 12:39:53 -0800

-0

If spark-ec2 is still a supported part of the project, then we should
update its version lists as new releases are made. 1.5.2 had the same issue.


https://github.com/apache/spark/blob/v1.6.0-rc1/ec2/spark_ec2.py#L54-L91

(I guess as part of the 2.0 discussions we should continue to discuss
whether spark-ec2 still belongs in the project. I'm starting to feel
awkward reporting spark-ec2 release issues...)

Nick

On Wed, Dec 2, 2015 at 3:27 PM Michael Armbrust <mich...@databricks.com>
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 5, 2015 at 21:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc1
> (bf525845cef159d2d4c9f4d64e158f037179b5c4)
> <https://github.com/apache/spark/tree/v1.6.0-rc1>*
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1165/
>
> The test repository (versioned as v1.6.0-rc1) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1164/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc1-docs/
>
>
> =======================================
> == How can I help test this release? ==
> =======================================
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> ================================================
> == What justifies a -1 vote for this release? ==
> ================================================
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===============================================================
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===============================================================
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==================================================
> == Major changes to help you focus your testing ==
> ==================================================
>
> Spark SQL
>
>    - SPARK-10810 <https://issues.apache.org/jira/browse/SPARK-10810>
>    Session Management - The ability to create multiple isolated SQL
>    Contexts that have their own configuration and default database.  This is
>    turned on by default in the thrift server.
>    - SPARK-9999  <https://issues.apache.org/jira/browse/SPARK-9999> Dataset
>    API - A type-safe API (similar to RDDs) that performs many operations
>    on serialized binary data and code generation (i.e. Project Tungsten).
>    - SPARK-10000 <https://issues.apache.org/jira/browse/SPARK-10000> Unified
>    Memory Management - Shared memory for execution and caching instead of
>    exclusive division of the regions.
>    - SPARK-11197 <https://issues.apache.org/jira/browse/SPARK-11197> SQL
>    Queries on Files - Concise syntax for running SQL queries over files
>    of any supported format without registering a table.
>    - SPARK-11745 <https://issues.apache.org/jira/browse/SPARK-11745> Reading
>    non-standard JSON files - Added options to read non-standard JSON
>    files (e.g. single-quotes, unquoted attributes)
>    - SPARK-10412 <https://issues.apache.org/jira/browse/SPARK-10412> 
> Per-operator
>    Metics for SQL Execution - Display statistics on a per-operator basis
>    for memory usage and spilled data size.
>    - SPARK-11329 <https://issues.apache.org/jira/browse/SPARK-11329> Star
>    (*) expansion for StructTypes - Makes it easier to nest and unest
>    arbitrary numbers of columns
>    - SPARK-10917 <https://issues.apache.org/jira/browse/SPARK-10917>,
>    SPARK-11149 <https://issues.apache.org/jira/browse/SPARK-11149> In-memory
>    Columnar Cache Performance - Significant (up to 14x) speed up when
>    caching data that contains complex types in DataFrames or SQL.
>    - SPARK-11111 <https://issues.apache.org/jira/browse/SPARK-11111> Fast
>    null-safe joins - Joins using null-safe equality (<=>) will now
>    execute using SortMergeJoin instead of computing a cartisian product.
>    - SPARK-11389 <https://issues.apache.org/jira/browse/SPARK-11389> SQL
>    Execution Using Off-Heap Memory - Support for configuring query
>    execution to occur using off-heap memory to avoid GC overhead
>    - SPARK-10978 <https://issues.apache.org/jira/browse/SPARK-10978> 
> Datasource
>    API Avoid Double Filter - When implementing a datasource with filter
>    pushdown, developers can now tell Spark SQL to avoid double evaluating a
>    pushed-down filter.
>    - SPARK-4849  <https://issues.apache.org/jira/browse/SPARK-4849> Advanced
>    Layout of Cached Data - storing partitioning and ordering schemes in
>    In-memory table scan, and adding distributeBy and localSort to DF API
>    - SPARK-9858  <https://issues.apache.org/jira/browse/SPARK-9858> Adaptive
>    query execution - Initial support for automatically selecting the
>    number of reducers for joins and aggregations.
>
> Spark Streaming
>
>    - API Updates
>       - SPARK-2629  <https://issues.apache.org/jira/browse/SPARK-2629> New
>       improved state management - trackStateByKey - a DStream
>       transformation for stateful stream processing, supersedes
>       updateStateByKey in functionality and performance.
>       - SPARK-11198 <https://issues.apache.org/jira/browse/SPARK-11198> 
> Kinesis
>       record deaggregation - Kinesis streams have been upgraded to use
>       KCL 1.4.0 and supports transparent deaggregation of KPL-aggregated 
> records.
>       - SPARK-10891 <https://issues.apache.org/jira/browse/SPARK-10891> 
> Kinesis
>       message handler function - Allows arbitrary function to be applied
>       to a Kinesis record in the Kinesis receiver before to customize what 
> data
>       is to be stored in memory.
>       - SPARK-6328  <https://issues.apache.org/jira/browse/SPARK-6328>
>        Python Streaming Listener API - Get streaming statistics
>       (scheduling delays, batch processing times, etc.) in streaming.
>
>
>    - UI Improvements
>       - Made failures visible in the streaming tab, in the timelines,
>       batch list, and batch details page.
>       - Made output operations visible in the streaming tab as progress
>       bars
>
> MLlibNew algorithms/models
>
>    - SPARK-8518  <https://issues.apache.org/jira/browse/SPARK-8518> Survival
>    analysis - Log-linear model for survival analysis
>    - SPARK-9834  <https://issues.apache.org/jira/browse/SPARK-9834> Normal
>    equation for least squares - Normal equation solver, providing R-like
>    model summary statistics
>    - SPARK-3147  <https://issues.apache.org/jira/browse/SPARK-3147> Online
>    hypothesis testing - A/B testing in the Spark Streaming framework
>    - SPARK-9930  <https://issues.apache.org/jira/browse/SPARK-9930> New
>    feature transformers - ChiSqSelector, QuantileDiscretizer, SQL
>    transformer
>    - SPARK-6517  <https://issues.apache.org/jira/browse/SPARK-6517> Bisecting
>    K-Means clustering - Fast top-down clustering variant of K-Means
>
> API improvements
>
>    - ML Pipelines
>       - SPARK-6725  <https://issues.apache.org/jira/browse/SPARK-6725> 
> Pipeline
>       persistence - Save/load for ML Pipelines, with partial coverage of
>       spark.ml algorithms
>       - SPARK-5565  <https://issues.apache.org/jira/browse/SPARK-5565> LDA
>       in ML Pipelines - API for Latent Dirichlet Allocation in ML
>       Pipelines
>    - R API
>       - SPARK-9836  <https://issues.apache.org/jira/browse/SPARK-9836> R-like
>       statistics for GLMs - (Partial) R-like stats for ordinary least
>       squares via summary(model)
>       - SPARK-9681  <https://issues.apache.org/jira/browse/SPARK-9681> Feature
>       interactions in R formula - Interaction operator ":" in R formula
>    - Python API - Many improvements to Python API to approach feature
>    parity
>
> Misc improvements
>
>    - SPARK-7685  <https://issues.apache.org/jira/browse/SPARK-7685>,
>    SPARK-9642  <https://issues.apache.org/jira/browse/SPARK-9642> Instance
>    weights for GLMs - Logistic and Linear Regression can take instance
>    weights
>    - SPARK-10384 <https://issues.apache.org/jira/browse/SPARK-10384>,
>    SPARK-10385 <https://issues.apache.org/jira/browse/SPARK-10385> Univariate
>    and bivariate statistics in DataFrames - Variance, stddev,
>    correlations, etc.
>    - SPARK-10117 <https://issues.apache.org/jira/browse/SPARK-10117> LIBSVM
>    data source - LIBSVM as a SQL data sourceDocumentation improvements
>    - SPARK-7751  <https://issues.apache.org/jira/browse/SPARK-7751> @since
>    versions - Documentation includes initial version when classes and
>    methods were added
>    - SPARK-11337 <https://issues.apache.org/jira/browse/SPARK-11337> Testable
>    example code - Automated testing for code in user guide examples
>
> Deprecations
>
>    - In spark.mllib.clustering.KMeans, the "runs" parameter has been
>    deprecated.
>    - In spark.ml.classification.LogisticRegressionModel and
>    spark.ml.regression.LinearRegressionModel, the "weights" field has been
>    deprecated, in favor of the new name "coefficients." This helps
>    disambiguate from instance (row) weights given to algorithms.
>
> Changes of behavior
>
>    - spark.mllib.tree.GradientBoostedTrees validationTol has changed
>    semantics in 1.6. Previously, it was a threshold for absolute change in
>    error. Now, it resembles the behavior of GradientDescent convergenceTol:
>    For large errors, it uses relative error (relative to the previous error);
>    for small errors (< 0.01), it uses absolute error.
>    - spark.ml.feature.RegexTokenizer: Previously, it did not convert
>    strings to lowercase before tokenizing. Now, it converts to lowercase by
>    default, with an option not to. This matches the behavior of the simpler
>    Tokenizer transformer.
>
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Reply via email to