+1

On Tue, Dec 22, 2015 at 7:05 PM Aaron Davidson <ilike...@gmail.com> wrote:

> +1
>
> On Tue, Dec 22, 2015 at 7:01 PM, Josh Rosen <joshro...@databricks.com>
> wrote:
>
>> +1
>>
>> On Tue, Dec 22, 2015 at 7:00 PM, Jeff Zhang <zjf...@gmail.com> wrote:
>>
>>> +1
>>>
>>> On Wed, Dec 23, 2015 at 7:36 AM, Mark Hamstra <m...@clearstorydata.com>
>>> wrote:
>>>
>>>> +1
>>>>
>>>> On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust <
>>>> mich...@databricks.com> wrote:
>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>> version 1.6.0!
>>>>>
>>>>> The vote is open until Friday, December 25, 2015 at 18:00 UTC and
>>>>> passes if a majority of at least 3 +1 PMC votes are cast.
>>>>>
>>>>> [ ] +1 Release this package as Apache Spark 1.6.0
>>>>> [ ] -1 Do not release this package because ...
>>>>>
>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>
>>>>> The tag to be voted on is *v1.6.0-rc4
>>>>> (4062cda3087ae42c6c3cb24508fc1d3a931accdf)
>>>>> <https://github.com/apache/spark/tree/v1.6.0-rc4>*
>>>>>
>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/
>>>>>
>>>>> Release artifacts are signed with the following key:
>>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>>
>>>>> The staging repository for this release can be found at:
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1176/
>>>>>
>>>>> The test repository (versioned as v1.6.0-rc4) for this release can be
>>>>> found at:
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1175/
>>>>>
>>>>> The documentation corresponding to this release can be found at:
>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/
>>>>>
>>>>> =======================================
>>>>> == How can I help test this release? ==
>>>>> =======================================
>>>>> If you are a Spark user, you can help us test this release by taking
>>>>> an existing Spark workload and running on this release candidate, then
>>>>> reporting any regressions.
>>>>>
>>>>> ================================================
>>>>> == What justifies a -1 vote for this release? ==
>>>>> ================================================
>>>>> This vote is happening towards the end of the 1.6 QA period, so -1
>>>>> votes should only occur for significant regressions from 1.5. Bugs already
>>>>> present in 1.5, minor regressions, or bugs related to new features will 
>>>>> not
>>>>> block this release.
>>>>>
>>>>> ===============================================================
>>>>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>>>>> ===============================================================
>>>>> 1. It is OK for documentation patches to target 1.6.0 and still go
>>>>> into branch-1.6, since documentations will be published separately from 
>>>>> the
>>>>> release.
>>>>> 2. New features for non-alpha-modules should target 1.7+.
>>>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>>>> target version.
>>>>>
>>>>>
>>>>> ==================================================
>>>>> == Major changes to help you focus your testing ==
>>>>> ==================================================
>>>>>
>>>>> Notable changes since 1.6 RC3
>>>>>
>>>>>   - SPARK-12404 - Fix serialization error for Datasets with
>>>>> Timestamps/Arrays/Decimal
>>>>>   - SPARK-12218 - Fix incorrect pushdown of filters to parquet
>>>>>   - SPARK-12395 - Fix join columns of outer join for DataFrame using
>>>>>   - SPARK-12413 - Fix mesos HA
>>>>>
>>>>> Notable changes since 1.6 RC2
>>>>> - SPARK_VERSION has been set correctly
>>>>> - SPARK-12199 ML Docs are publishing correctly
>>>>> - SPARK-12345 Mesos cluster mode has been fixed
>>>>>
>>>>> Notable changes since 1.6 RC1
>>>>> Spark Streaming
>>>>>
>>>>>    - SPARK-2629  <https://issues.apache.org/jira/browse/SPARK-2629>
>>>>>    trackStateByKey has been renamed to mapWithState
>>>>>
>>>>> Spark SQL
>>>>>
>>>>>    - SPARK-12165 <https://issues.apache.org/jira/browse/SPARK-12165>
>>>>>    SPARK-12189 <https://issues.apache.org/jira/browse/SPARK-12189> Fix
>>>>>    bugs in eviction of storage memory by execution.
>>>>>    - SPARK-12258 <https://issues.apache.org/jira/browse/SPARK-12258> 
>>>>> correct
>>>>>    passing null into ScalaUDF
>>>>>
>>>>> Notable Features Since 1.5Spark SQL
>>>>>
>>>>>    - SPARK-11787 <https://issues.apache.org/jira/browse/SPARK-11787> 
>>>>> Parquet
>>>>>    Performance - Improve Parquet scan performance when using flat
>>>>>    schemas.
>>>>>    - SPARK-10810 <https://issues.apache.org/jira/browse/SPARK-10810>
>>>>>    Session Management - Isolated devault database (i.e USE mydb) even
>>>>>    on shared clusters.
>>>>>    - SPARK-9999  <https://issues.apache.org/jira/browse/SPARK-9999> 
>>>>> Dataset
>>>>>    API - A type-safe API (similar to RDDs) that performs many
>>>>>    operations on serialized binary data and code generation (i.e. Project
>>>>>    Tungsten).
>>>>>    - SPARK-10000 <https://issues.apache.org/jira/browse/SPARK-10000> 
>>>>> Unified
>>>>>    Memory Management - Shared memory for execution and caching
>>>>>    instead of exclusive division of the regions.
>>>>>    - SPARK-11197 <https://issues.apache.org/jira/browse/SPARK-11197> SQL
>>>>>    Queries on Files - Concise syntax for running SQL queries over
>>>>>    files of any supported format without registering a table.
>>>>>    - SPARK-11745 <https://issues.apache.org/jira/browse/SPARK-11745> 
>>>>> Reading
>>>>>    non-standard JSON files - Added options to read non-standard JSON
>>>>>    files (e.g. single-quotes, unquoted attributes)
>>>>>    - SPARK-10412 <https://issues.apache.org/jira/browse/SPARK-10412> 
>>>>> Per-operator
>>>>>    Metrics for SQL Execution - Display statistics on a peroperator
>>>>>    basis for memory usage and spilled data size.
>>>>>    - SPARK-11329 <https://issues.apache.org/jira/browse/SPARK-11329> Star
>>>>>    (*) expansion for StructTypes - Makes it easier to nest and unest
>>>>>    arbitrary numbers of columns
>>>>>    - SPARK-10917 <https://issues.apache.org/jira/browse/SPARK-10917>,
>>>>>    SPARK-11149 <https://issues.apache.org/jira/browse/SPARK-11149> 
>>>>> In-memory
>>>>>    Columnar Cache Performance - Significant (up to 14x) speed up when
>>>>>    caching data that contains complex types in DataFrames or SQL.
>>>>>    - SPARK-11111 <https://issues.apache.org/jira/browse/SPARK-11111> Fast
>>>>>    null-safe joins - Joins using null-safe equality (<=>) will now
>>>>>    execute using SortMergeJoin instead of computing a cartisian product.
>>>>>    - SPARK-11389 <https://issues.apache.org/jira/browse/SPARK-11389> SQL
>>>>>    Execution Using Off-Heap Memory - Support for configuring query
>>>>>    execution to occur using off-heap memory to avoid GC overhead
>>>>>    - SPARK-10978 <https://issues.apache.org/jira/browse/SPARK-10978> 
>>>>> Datasource
>>>>>    API Avoid Double Filter - When implemeting a datasource with
>>>>>    filter pushdown, developers can now tell Spark SQL to avoid double
>>>>>    evaluating a pushed-down filter.
>>>>>    - SPARK-4849  <https://issues.apache.org/jira/browse/SPARK-4849> 
>>>>> Advanced
>>>>>    Layout of Cached Data - storing partitioning and ordering schemes
>>>>>    in In-memory table scan, and adding distributeBy and localSort to DF 
>>>>> API
>>>>>    - SPARK-9858  <https://issues.apache.org/jira/browse/SPARK-9858> 
>>>>> Adaptive
>>>>>    query execution - Intial support for automatically selecting the
>>>>>    number of reducers for joins and aggregations.
>>>>>    - SPARK-9241  <https://issues.apache.org/jira/browse/SPARK-9241> 
>>>>> Improved
>>>>>    query planner for queries having distinct aggregations - Query
>>>>>    plans of distinct aggregations are more robust when distinct columns 
>>>>> have
>>>>>    high cardinality.
>>>>>
>>>>> Spark Streaming
>>>>>
>>>>>    - API Updates
>>>>>       - SPARK-2629  <https://issues.apache.org/jira/browse/SPARK-2629>
>>>>>        New improved state management - mapWithState - a DStream
>>>>>       transformation for stateful stream processing, supercedes
>>>>>       updateStateByKey in functionality and performance.
>>>>>       - SPARK-11198
>>>>>       <https://issues.apache.org/jira/browse/SPARK-11198> Kinesis
>>>>>       record deaggregation - Kinesis streams have been upgraded to
>>>>>       use KCL 1.4.0 and supports transparent deaggregation of 
>>>>> KPL-aggregated
>>>>>       records.
>>>>>       - SPARK-10891
>>>>>       <https://issues.apache.org/jira/browse/SPARK-10891> Kinesis
>>>>>       message handler function - Allows arbitraray function to be
>>>>>       applied to a Kinesis record in the Kinesis receiver before to 
>>>>> customize
>>>>>       what data is to be stored in memory.
>>>>>       - SPARK-6328  <https://issues.apache.org/jira/browse/SPARK-6328>
>>>>>        Python Streamng Listener API - Get streaming statistics
>>>>>       (scheduling delays, batch processing times, etc.) in streaming.
>>>>>
>>>>>
>>>>>    - UI Improvements
>>>>>       - Made failures visible in the streaming tab, in the timelines,
>>>>>       batch list, and batch details page.
>>>>>       - Made output operations visible in the streaming tab as
>>>>>       progress bars.
>>>>>
>>>>> MLlibNew algorithms/models
>>>>>
>>>>>    - SPARK-8518  <https://issues.apache.org/jira/browse/SPARK-8518> 
>>>>> Survival
>>>>>    analysis - Log-linear model for survival analysis
>>>>>    - SPARK-9834  <https://issues.apache.org/jira/browse/SPARK-9834> Normal
>>>>>    equation for least squares - Normal equation solver, providing
>>>>>    R-like model summary statistics
>>>>>    - SPARK-3147  <https://issues.apache.org/jira/browse/SPARK-3147> Online
>>>>>    hypothesis testing - A/B testing in the Spark Streaming framework
>>>>>    - SPARK-9930  <https://issues.apache.org/jira/browse/SPARK-9930> New
>>>>>    feature transformers - ChiSqSelector, QuantileDiscretizer, SQL
>>>>>    transformer
>>>>>    - SPARK-6517  <https://issues.apache.org/jira/browse/SPARK-6517> 
>>>>> Bisecting
>>>>>    K-Means clustering - Fast top-down clustering variant of K-Means
>>>>>
>>>>> API improvements
>>>>>
>>>>>    - ML Pipelines
>>>>>       - SPARK-6725  <https://issues.apache.org/jira/browse/SPARK-6725>
>>>>>        Pipeline persistence - Save/load for ML Pipelines, with
>>>>>       partial coverage of spark.mlalgorithms
>>>>>       - SPARK-5565  <https://issues.apache.org/jira/browse/SPARK-5565>
>>>>>        LDA in ML Pipelines - API for Latent Dirichlet Allocation in
>>>>>       ML Pipelines
>>>>>    - R API
>>>>>       - SPARK-9836  <https://issues.apache.org/jira/browse/SPARK-9836>
>>>>>        R-like statistics for GLMs - (Partial) R-like stats for
>>>>>       ordinary least squares via summary(model)
>>>>>       - SPARK-9681  <https://issues.apache.org/jira/browse/SPARK-9681>
>>>>>        Feature interactions in R formula - Interaction operator ":"
>>>>>       in R formula
>>>>>    - Python API - Many improvements to Python API to approach feature
>>>>>    parity
>>>>>
>>>>> Misc improvements
>>>>>
>>>>>    - SPARK-7685  <https://issues.apache.org/jira/browse/SPARK-7685>,
>>>>>    SPARK-9642  <https://issues.apache.org/jira/browse/SPARK-9642> Instance
>>>>>    weights for GLMs - Logistic and Linear Regression can take
>>>>>    instance weights
>>>>>    - SPARK-10384 <https://issues.apache.org/jira/browse/SPARK-10384>,
>>>>>    SPARK-10385 <https://issues.apache.org/jira/browse/SPARK-10385> 
>>>>> Univariate
>>>>>    and bivariate statistics in DataFrames - Variance, stddev,
>>>>>    correlations, etc.
>>>>>    - SPARK-10117 <https://issues.apache.org/jira/browse/SPARK-10117> 
>>>>> LIBSVM
>>>>>    data source - LIBSVM as a SQL data sourceDocumentation improvements
>>>>>    - SPARK-7751  <https://issues.apache.org/jira/browse/SPARK-7751> @since
>>>>>    versions - Documentation includes initial version when classes and
>>>>>    methods were added
>>>>>    - SPARK-11337 <https://issues.apache.org/jira/browse/SPARK-11337> 
>>>>> Testable
>>>>>    example code - Automated testing for code in user guide examples
>>>>>
>>>>> Deprecations
>>>>>
>>>>>    - In spark.mllib.clustering.KMeans, the "runs" parameter has been
>>>>>    deprecated.
>>>>>    - In spark.ml.classification.LogisticRegressionModel and
>>>>>    spark.ml.regression.LinearRegressionModel, the "weights" field has been
>>>>>    deprecated, in favor of the new name "coefficients." This helps
>>>>>    disambiguate from instance (row) weights given to algorithms.
>>>>>
>>>>> Changes of behavior
>>>>>
>>>>>    - spark.mllib.tree.GradientBoostedTrees validationTol has changed
>>>>>    semantics in 1.6. Previously, it was a threshold for absolute change in
>>>>>    error. Now, it resembles the behavior of GradientDescent 
>>>>> convergenceTol:
>>>>>    For large errors, it uses relative error (relative to the previous 
>>>>> error);
>>>>>    for small errors (< 0.01), it uses absolute error.
>>>>>    - spark.ml.feature.RegexTokenizer: Previously, it did not convert
>>>>>    strings to lowercase before tokenizing. Now, it converts to lowercase 
>>>>> by
>>>>>    default, with an option not to. This matches the behavior of the 
>>>>> simpler
>>>>>    Tokenizer transformer.
>>>>>    - Spark SQL's partition discovery has been changed to only
>>>>>    discover partition directories that are children of the given path. 
>>>>> (i.e.
>>>>>    if path="/my/data/x=1" then x=1 will no longer be considered a
>>>>>    partition but only children of x=1.) This behavior can be
>>>>>    overridden by manually specifying the basePath that partitioning
>>>>>    discovery should start with (SPARK-11678
>>>>>    <https://issues.apache.org/jira/browse/SPARK-11678>).
>>>>>    - When casting a value of an integral type to timestamp (e.g.
>>>>>    casting a long value to timestamp), the value is treated as being in
>>>>>    seconds instead of milliseconds (SPARK-11724
>>>>>    <https://issues.apache.org/jira/browse/SPARK-11724>).
>>>>>    - With the improved query planner for queries having distinct
>>>>>    aggregations (SPARK-9241
>>>>>    <https://issues.apache.org/jira/browse/SPARK-9241>), the plan of a
>>>>>    query having a single distinct aggregation has been changed to a more
>>>>>    robust version. To switch back to the plan generated by Spark 1.5's
>>>>>    planner, please set spark.sql.specializeSingleDistinctAggPlanning
>>>>>     to true (SPARK-12077
>>>>>    <https://issues.apache.org/jira/browse/SPARK-12077>).
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>

Reply via email to