RE: [VOTE] Release Apache Spark 1.6.0 (RC2)

Jean-Baptiste Onofré Sat, 12 Dec 2015 11:11:00 -0800

    
+1 (non binding)
Tested with different samples.
RegardsJB


Sent from my Samsung device

-------- Original message --------
From: Michael Armbrust <mich...@databricks.com> 
Date: 12/12/2015  18:39  (GMT+01:00) 
To: dev@spark.apache.org 
Subject: [VOTE] Release Apache Spark 1.6.0 (RC2) 

Please vote on releasing the following candidate as Apache Spark version 1.6.0!
The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.6.0[ ] -1 Do not release this 
package because ...
To learn more about Apache Spark, please see http://spark.apache.org/
The tag to be voted on is v1.6.0-rc2 (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
The release files, including signatures, digests, etc. can be found 
at:http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
Release artifacts are signed with the following 
key:https://people.apache.org/keys/committer/pwendell.asc
The staging repository for this release can be found 
at:https://repository.apache.org/content/repositories/orgapachespark-1169/
The test repository (versioned as v1.6.0-rc2) for this release can be found 
at:https://repository.apache.org/content/repositories/orgapachespark-1168/
The documentation corresponding to this release can be found 
at:http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
========================================= How can I help test this release? 
=========================================If you are a Spark user, you can help 
us test this release by taking an existing Spark workload and running on this 
release candidate, then reporting any regressions.
================================================== What justifies a -1 vote for 
this release? ==================================================This vote is 
happening towards the end of the 1.6 QA period, so -1 votes should only occur 
for significant regressions from 1.5. Bugs already present in 1.5, minor 
regressions, or bugs related to new features will not block this release.
================================================================= What should 
happen to JIRA tickets still targeting 1.6.0? 
=================================================================1. It is OK 
for documentation patches to target 1.6.0 and still go into branch-1.6, since 
documentations will be published separately from the release.2. New features 
for non-alpha-modules should target 1.7+.3. Non-blocker bug fixes should target 
1.6.1 or 1.7.0, or drop the target version.

==================================================== Major changes to help you 
focus your testing ====================================================
Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark StreamingSPARK-2629  
trackStateByKey has been renamed to mapWithStateSpark SQLSPARK-12165 
SPARK-12189 Fix bugs in eviction of storage memory by execution.SPARK-12258 
correct passing null into ScalaUDFNotable Features Since 1.5Spark 
SQLSPARK-11787 Parquet Performance - Improve Parquet scan performance when 
using flat schemas.SPARK-10810 Session Management - Isolated devault database 
(i.e USE mydb) even on shared clusters.SPARK-9999  Dataset API - A type-safe 
API (similar to RDDs) that performs many operations on serialized binary data 
and code generation (i.e. Project Tungsten).SPARK-10000 Unified Memory 
Management - Shared memory for execution and caching instead of exclusive 
division of the regions.SPARK-11197 SQL Queries on Files - Concise syntax for 
running SQL queries over files of any supported format without registering a 
table.SPARK-11745 Reading non-standard JSON files - Added options to read 
non-standard JSON files (e.g. single-quotes, unquoted attributes)SPARK-10412 
Per-operator Metrics for SQL Execution - Display statistics on a peroperator 
basis for memory usage and spilled data size.SPARK-11329 Star (*) expansion for 
StructTypes - Makes it easier to nest and unest arbitrary numbers of 
columnsSPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - 
Significant (up to 14x) speed up when caching data that contains complex types 
in DataFrames or SQL.SPARK-11111 Fast null-safe joins - Joins using null-safe 
equality (<=>) will now execute using SortMergeJoin instead of computing a 
cartisian product.SPARK-11389 SQL Execution Using Off-Heap Memory - Support for 
configuring query execution to occur using off-heap memory to avoid GC 
overheadSPARK-10978 Datasource API Avoid Double Filter - When implemeting a 
datasource with filter pushdown, developers can now tell Spark SQL to avoid 
double evaluating a pushed-down filter.SPARK-4849  Advanced Layout of Cached 
Data - storing partitioning and ordering schemes in In-memory table scan, and 
adding distributeBy and localSort to DF APISPARK-9858  Adaptive query execution 
- Intial support for automatically selecting the number of reducers for joins 
and aggregations.SPARK-9241  Improved query planner for queries having distinct 
aggregations - Query plans of distinct aggregations are more robust when 
distinct columns have high cardinality.Spark StreamingAPI UpdatesSPARK-2629  
New improved state management - mapWithState - a DStream transformation for 
stateful stream processing, supercedes updateStateByKey in functionality and 
performance.SPARK-11198 Kinesis record deaggregation - Kinesis streams have 
been upgraded to use KCL 1.4.0 and supports transparent deaggregation of 
KPL-aggregated records.SPARK-10891 Kinesis message handler function - Allows 
arbitraray function to be applied to a Kinesis record in the Kinesis receiver 
before to customize what data is to be stored in memory.SPARK-6328  Python 
Streamng Listener API - Get streaming statistics (scheduling delays, batch 
processing times, etc.) in streaming.UI ImprovementsMade failures visible in 
the streaming tab, in the timelines, batch list, and batch details page.Made 
output operations visible in the streaming tab as progress bars.MLlibNew 
algorithms/modelsSPARK-8518  Survival analysis - Log-linear model for survival 
analysisSPARK-9834  Normal equation for least squares - Normal equation solver, 
providing R-like model summary statisticsSPARK-3147  Online hypothesis testing 
- A/B testing in the Spark Streaming frameworkSPARK-9930  New feature 
transformers - ChiSqSelector, QuantileDiscretizer, SQL transformerSPARK-6517  
Bisecting K-Means clustering - Fast top-down clustering variant of K-MeansAPI 
improvementsML PipelinesSPARK-6725  Pipeline persistence - Save/load for ML 
Pipelines, with partial coverage of spark.ml algorithmsSPARK-5565  LDA in ML 
Pipelines - API for Latent Dirichlet Allocation in ML PipelinesR APISPARK-9836  
R-like statistics for GLMs - (Partial) R-like stats for ordinary least squares 
via summary(model)SPARK-9681  Feature interactions in R formula - Interaction 
operator ":" in R formulaPython API - Many improvements to Python API to 
approach feature parityMisc improvementsSPARK-7685 , SPARK-9642  Instance 
weights for GLMs - Logistic and Linear Regression can take instance 
weightsSPARK-10384, SPARK-10385 Univariate and bivariate statistics in 
DataFrames - Variance, stddev, correlations, etc.SPARK-10117 LIBSVM data source 
- LIBSVM as a SQL data sourceDocumentation improvementsSPARK-7751  @since 
versions - Documentation includes initial version when classes and methods were 
addedSPARK-11337 Testable example code - Automated testing for code in user 
guide examplesDeprecationsIn spark.mllib.clustering.KMeans, the "runs" 
parameter has been deprecated.In 
spark.ml.classification.LogisticRegressionModel and 
spark.ml.regression.LinearRegressionModel, the "weights" field has been 
deprecated, in favor of the new name "coefficients." This helps disambiguate 
from instance (row) weights given to algorithms.Changes of 
behaviorspark.mllib.tree.GradientBoostedTrees validationTol has changed 
semantics in 1.6. Previously, it was a threshold for absolute change in error. 
Now, it resembles the behavior of GradientDescent convergenceTol: For large 
errors, it uses relative error (relative to the previous error); for small 
errors (< 0.01), it uses absolute error.spark.ml.feature.RegexTokenizer: 
Previously, it did not convert strings to lowercase before tokenizing. Now, it 
converts to lowercase by default, with an option not to. This matches the 
behavior of the simpler Tokenizer transformer.Spark SQL's partition discovery 
has been changed to only discover partition directories that are children of 
the given path. (i.e. if path="/my/data/x=1" then x=1 will no longer be 
considered a partition but only children of x=1.) This behavior can be 
overridden by manually specifying the basePath that partitioning discovery 
should start with (SPARK-11678).When casting a value of an integral type to 
timestamp (e.g. casting a long value to timestamp), the value is treated as 
being in seconds instead of milliseconds (SPARK-11724).With the improved query 
planner for queries having distinct aggregations (SPARK-9241), the plan of a 
query having a single distinct aggregation has been changed to a more robust 
version. To switch back to the plan generated by Spark 1.5's planner, please 
set spark.sql.specializeSingleDistinctAggPlanning to true (SPARK-12077).

RE: [VOTE] Release Apache Spark 1.6.0 (RC2)

Reply via email to