I tried to run test suite and encountered the following: http://pastebin.com/DPnwMGrm
FYI On Wed, Dec 2, 2015 at 12:39 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > -0 > > If spark-ec2 is still a supported part of the project, then we should > update its version lists as new releases are made. 1.5.2 had the same issue. > > https://github.com/apache/spark/blob/v1.6.0-rc1/ec2/spark_ec2.py#L54-L91 > > (I guess as part of the 2.0 discussions we should continue to discuss > whether spark-ec2 still belongs in the project. I'm starting to feel > awkward reporting spark-ec2 release issues...) > > Nick > > On Wed, Dec 2, 2015 at 3:27 PM Michael Armbrust <mich...@databricks.com> > wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 1.6.0! >> >> The vote is open until Saturday, December 5, 2015 at 21:00 UTC and passes >> if a majority of at least 3 +1 PMC votes are cast. >> >> [ ] +1 Release this package as Apache Spark 1.6.0 >> [ ] -1 Do not release this package because ... >> >> To learn more about Apache Spark, please see http://spark.apache.org/ >> >> The tag to be voted on is *v1.6.0-rc1 >> (bf525845cef159d2d4c9f4d64e158f037179b5c4) >> <https://github.com/apache/spark/tree/v1.6.0-rc1>* >> >> The release files, including signatures, digests, etc. can be found at: >> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/ >> >> Release artifacts are signed with the following key: >> https://people.apache.org/keys/committer/pwendell.asc >> >> The staging repository for this release can be found at: >> https://repository.apache.org/content/repositories/orgapachespark-1165/ >> >> The test repository (versioned as v1.6.0-rc1) for this release can be >> found at: >> https://repository.apache.org/content/repositories/orgapachespark-1164/ >> >> The documentation corresponding to this release can be found at: >> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc1-docs/ >> >> >> ======================================= >> == How can I help test this release? == >> ======================================= >> If you are a Spark user, you can help us test this release by taking an >> existing Spark workload and running on this release candidate, then >> reporting any regressions. >> >> ================================================ >> == What justifies a -1 vote for this release? == >> ================================================ >> This vote is happening towards the end of the 1.6 QA period, so -1 votes >> should only occur for significant regressions from 1.5. Bugs already >> present in 1.5, minor regressions, or bugs related to new features will not >> block this release. >> >> =============================================================== >> == What should happen to JIRA tickets still targeting 1.6.0? == >> =============================================================== >> 1. It is OK for documentation patches to target 1.6.0 and still go into >> branch-1.6, since documentations will be published separately from the >> release. >> 2. New features for non-alpha-modules should target 1.7+. >> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target >> version. >> >> >> ================================================== >> == Major changes to help you focus your testing == >> ================================================== >> >> Spark SQL >> >> - SPARK-10810 <https://issues.apache.org/jira/browse/SPARK-10810> >> Session Management - The ability to create multiple isolated SQL >> Contexts that have their own configuration and default database. This is >> turned on by default in the thrift server. >> - SPARK-9999 <https://issues.apache.org/jira/browse/SPARK-9999> Dataset >> API - A type-safe API (similar to RDDs) that performs many operations >> on serialized binary data and code generation (i.e. Project Tungsten). >> - SPARK-10000 <https://issues.apache.org/jira/browse/SPARK-10000> Unified >> Memory Management - Shared memory for execution and caching instead >> of exclusive division of the regions. >> - SPARK-11197 <https://issues.apache.org/jira/browse/SPARK-11197> SQL >> Queries on Files - Concise syntax for running SQL queries over files >> of any supported format without registering a table. >> - SPARK-11745 <https://issues.apache.org/jira/browse/SPARK-11745> Reading >> non-standard JSON files - Added options to read non-standard JSON >> files (e.g. single-quotes, unquoted attributes) >> - SPARK-10412 <https://issues.apache.org/jira/browse/SPARK-10412> >> Per-operator >> Metics for SQL Execution - Display statistics on a per-operator basis >> for memory usage and spilled data size. >> - SPARK-11329 <https://issues.apache.org/jira/browse/SPARK-11329> Star >> (*) expansion for StructTypes - Makes it easier to nest and unest >> arbitrary numbers of columns >> - SPARK-10917 <https://issues.apache.org/jira/browse/SPARK-10917>, >> SPARK-11149 <https://issues.apache.org/jira/browse/SPARK-11149> In-memory >> Columnar Cache Performance - Significant (up to 14x) speed up when >> caching data that contains complex types in DataFrames or SQL. >> - SPARK-11111 <https://issues.apache.org/jira/browse/SPARK-11111> Fast >> null-safe joins - Joins using null-safe equality (<=>) will now >> execute using SortMergeJoin instead of computing a cartisian product. >> - SPARK-11389 <https://issues.apache.org/jira/browse/SPARK-11389> SQL >> Execution Using Off-Heap Memory - Support for configuring query >> execution to occur using off-heap memory to avoid GC overhead >> - SPARK-10978 <https://issues.apache.org/jira/browse/SPARK-10978> >> Datasource >> API Avoid Double Filter - When implementing a datasource with filter >> pushdown, developers can now tell Spark SQL to avoid double evaluating a >> pushed-down filter. >> - SPARK-4849 <https://issues.apache.org/jira/browse/SPARK-4849> Advanced >> Layout of Cached Data - storing partitioning and ordering schemes in >> In-memory table scan, and adding distributeBy and localSort to DF API >> - SPARK-9858 <https://issues.apache.org/jira/browse/SPARK-9858> Adaptive >> query execution - Initial support for automatically selecting the >> number of reducers for joins and aggregations. >> >> Spark Streaming >> >> - API Updates >> - SPARK-2629 <https://issues.apache.org/jira/browse/SPARK-2629> New >> improved state management - trackStateByKey - a DStream >> transformation for stateful stream processing, supersedes >> updateStateByKey in functionality and performance. >> - SPARK-11198 <https://issues.apache.org/jira/browse/SPARK-11198> >> Kinesis >> record deaggregation - Kinesis streams have been upgraded to use >> KCL 1.4.0 and supports transparent deaggregation of KPL-aggregated >> records. >> - SPARK-10891 <https://issues.apache.org/jira/browse/SPARK-10891> >> Kinesis >> message handler function - Allows arbitrary function to be applied >> to a Kinesis record in the Kinesis receiver before to customize what >> data >> is to be stored in memory. >> - SPARK-6328 <https://issues.apache.org/jira/browse/SPARK-6328> >> Python Streaming Listener API - Get streaming statistics >> (scheduling delays, batch processing times, etc.) in streaming. >> >> >> - UI Improvements >> - Made failures visible in the streaming tab, in the timelines, >> batch list, and batch details page. >> - Made output operations visible in the streaming tab as progress >> bars >> >> MLlibNew algorithms/models >> >> - SPARK-8518 <https://issues.apache.org/jira/browse/SPARK-8518> Survival >> analysis - Log-linear model for survival analysis >> - SPARK-9834 <https://issues.apache.org/jira/browse/SPARK-9834> Normal >> equation for least squares - Normal equation solver, providing R-like >> model summary statistics >> - SPARK-3147 <https://issues.apache.org/jira/browse/SPARK-3147> Online >> hypothesis testing - A/B testing in the Spark Streaming framework >> - SPARK-9930 <https://issues.apache.org/jira/browse/SPARK-9930> New >> feature transformers - ChiSqSelector, QuantileDiscretizer, SQL >> transformer >> - SPARK-6517 <https://issues.apache.org/jira/browse/SPARK-6517> Bisecting >> K-Means clustering - Fast top-down clustering variant of K-Means >> >> API improvements >> >> - ML Pipelines >> - SPARK-6725 <https://issues.apache.org/jira/browse/SPARK-6725> >> Pipeline >> persistence - Save/load for ML Pipelines, with partial coverage of >> spark.ml algorithms >> - SPARK-5565 <https://issues.apache.org/jira/browse/SPARK-5565> LDA >> in ML Pipelines - API for Latent Dirichlet Allocation in ML >> Pipelines >> - R API >> - SPARK-9836 <https://issues.apache.org/jira/browse/SPARK-9836> R-like >> statistics for GLMs - (Partial) R-like stats for ordinary least >> squares via summary(model) >> - SPARK-9681 <https://issues.apache.org/jira/browse/SPARK-9681> >> Feature >> interactions in R formula - Interaction operator ":" in R formula >> - Python API - Many improvements to Python API to approach feature >> parity >> >> Misc improvements >> >> - SPARK-7685 <https://issues.apache.org/jira/browse/SPARK-7685>, >> SPARK-9642 <https://issues.apache.org/jira/browse/SPARK-9642> Instance >> weights for GLMs - Logistic and Linear Regression can take instance >> weights >> - SPARK-10384 <https://issues.apache.org/jira/browse/SPARK-10384>, >> SPARK-10385 <https://issues.apache.org/jira/browse/SPARK-10385> Univariate >> and bivariate statistics in DataFrames - Variance, stddev, >> correlations, etc. >> - SPARK-10117 <https://issues.apache.org/jira/browse/SPARK-10117> LIBSVM >> data source - LIBSVM as a SQL data sourceDocumentation improvements >> - SPARK-7751 <https://issues.apache.org/jira/browse/SPARK-7751> @since >> versions - Documentation includes initial version when classes and >> methods were added >> - SPARK-11337 <https://issues.apache.org/jira/browse/SPARK-11337> Testable >> example code - Automated testing for code in user guide examples >> >> Deprecations >> >> - In spark.mllib.clustering.KMeans, the "runs" parameter has been >> deprecated. >> - In spark.ml.classification.LogisticRegressionModel and >> spark.ml.regression.LinearRegressionModel, the "weights" field has been >> deprecated, in favor of the new name "coefficients." This helps >> disambiguate from instance (row) weights given to algorithms. >> >> Changes of behavior >> >> - spark.mllib.tree.GradientBoostedTrees validationTol has changed >> semantics in 1.6. Previously, it was a threshold for absolute change in >> error. Now, it resembles the behavior of GradientDescent convergenceTol: >> For large errors, it uses relative error (relative to the previous error); >> for small errors (< 0.01), it uses absolute error. >> - spark.ml.feature.RegexTokenizer: Previously, it did not convert >> strings to lowercase before tokenizing. Now, it converts to lowercase by >> default, with an option not to. This matches the behavior of the simpler >> Tokenizer transformer. >> >>