Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

Sean Owen Fri, 18 Dec 2015 09:33:47 -0800

Yes that's what I mean. If they're not quite working, let's disable
them, but first, we have to rule out that I'm not just missing some
requirement.


Functionally, it's not worth blocking the release. It seems like bad
form to release with tests that always fail for a non-trivial number
of users, but we have to establish that. If it's something with an
easy fix (or needs disabling) and another RC needs to be baked, might
be worth including.

Logs coming offline

On Fri, Dec 18, 2015 at 5:30 PM, Mark Grover <[email protected]> wrote:
> Sean,
> Are you referring to docker integration tests? If so, they were disabled for
> majority of the release and I recently worked on it (SPARK-11796) and once
> it got committed, the tests were re-enabled in Spark builds. I am not sure
> what OSs the test builds use, but it should be passing there too.
>
> During my work, I tested on Ubuntu Precise and they worked. If you could
> share the logs with me offline, I could take a look. Alternatively, I can
> try to see if I can get Ubuntu 15 instance. However, given the history of
> these tests, I personally don't think it makes sense to block the release
> based on them not running on Ubuntu 15.
>
> On Fri, Dec 18, 2015 at 9:22 AM, Sean Owen <[email protected]> wrote:
>>
>> For me, mostly the same as before: tests are mostly passing, but I can
>> never get the docker tests to pass. If anyone knows a special profile
>> or package that needs to be enabled, I can try that and/or
>> fix/document it. Just wondering if it's me.
>>
>> I'm on Java 7 + Ubuntu 15.10, with -Pyarn -Phive -Phive-thriftserver
>> -Phadoop-2.6
>>
>> On Wed, Dec 16, 2015 at 9:32 PM, Michael Armbrust
>> <[email protected]> wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 1.6.0!
>> >
>> > The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
>> > passes
>> > if a majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 1.6.0
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v1.6.0-rc3
>> > (168c89e07c51fa24b0bb88582c739cec0acb44d7)
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>> >
>> > Release artifacts are signed with the following key:
>> > https://people.apache.org/keys/committer/pwendell.asc
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1174/
>> >
>> > The test repository (versioned as v1.6.0-rc3) for this release can be
>> > found
>> > at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1173/
>> >
>> > The documentation corresponding to this release can be found at:
>> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>> >
>> > =======================================
>> > == How can I help test this release? ==
>> > =======================================
>> > If you are a Spark user, you can help us test this release by taking an
>> > existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > ================================================
>> > == What justifies a -1 vote for this release? ==
>> > ================================================
>> > This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> > should only occur for significant regressions from 1.5. Bugs already
>> > present
>> > in 1.5, minor regressions, or bugs related to new features will not
>> > block
>> > this release.
>> >
>> > ===============================================================
>> > == What should happen to JIRA tickets still targeting 1.6.0? ==
>> > ===============================================================
>> > 1. It is OK for documentation patches to target 1.6.0 and still go into
>> > branch-1.6, since documentations will be published separately from the
>> > release.
>> > 2. New features for non-alpha-modules should target 1.7+.
>> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>> > target
>> > version.
>> >
>> >
>> > ==================================================
>> > == Major changes to help you focus your testing ==
>> > ==================================================
>> >
>> > Notable changes since 1.6 RC2
>> >
>> >
>> > - SPARK_VERSION has been set correctly
>> > - SPARK-12199 ML Docs are publishing correctly
>> > - SPARK-12345 Mesos cluster mode has been fixed
>> >
>> > Notable changes since 1.6 RC1
>> >
>> > Spark Streaming
>> >
>> > SPARK-2629  trackStateByKey has been renamed to mapWithState
>> >
>> > Spark SQL
>> >
>> > SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by
>> > execution.
>> > SPARK-12258 correct passing null into ScalaUDF
>> >
>> > Notable Features Since 1.5
>> >
>> > Spark SQL
>> >
>> > SPARK-11787 Parquet Performance - Improve Parquet scan performance when
>> > using flat schemas.
>> > SPARK-10810 Session Management - Isolated devault database (i.e USE
>> > mydb)
>> > even on shared clusters.
>> > SPARK-9999  Dataset API - A type-safe API (similar to RDDs) that
>> > performs
>> > many operations on serialized binary data and code generation (i.e.
>> > Project
>> > Tungsten).
>> > SPARK-10000 Unified Memory Management - Shared memory for execution and
>> > caching instead of exclusive division of the regions.
>> > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL
>> > queries
>> > over files of any supported format without registering a table.
>> > SPARK-11745 Reading non-standard JSON files - Added options to read
>> > non-standard JSON files (e.g. single-quotes, unquoted attributes)
>> > SPARK-10412 Per-operator Metrics for SQL Execution - Display statistics
>> > on a
>> > peroperator basis for memory usage and spilled data size.
>> > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to nest
>> > and
>> > unest arbitrary numbers of columns
>> > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance -
>> > Significant
>> > (up to 14x) speed up when caching data that contains complex types in
>> > DataFrames or SQL.
>> > SPARK-11111 Fast null-safe joins - Joins using null-safe equality (<=>)
>> > will
>> > now execute using SortMergeJoin instead of computing a cartisian
>> > product.
>> > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for
>> > configuring
>> > query execution to occur using off-heap memory to avoid GC overhead
>> > SPARK-10978 Datasource API Avoid Double Filter - When implemeting a
>> > datasource with filter pushdown, developers can now tell Spark SQL to
>> > avoid
>> > double evaluating a pushed-down filter.
>> > SPARK-4849  Advanced Layout of Cached Data - storing partitioning and
>> > ordering schemes in In-memory table scan, and adding distributeBy and
>> > localSort to DF API
>> > SPARK-9858  Adaptive query execution - Intial support for automatically
>> > selecting the number of reducers for joins and aggregations.
>> > SPARK-9241  Improved query planner for queries having distinct
>> > aggregations
>> > - Query plans of distinct aggregations are more robust when distinct
>> > columns
>> > have high cardinality.
>> >
>> > Spark Streaming
>> >
>> > API Updates
>> >
>> > SPARK-2629  New improved state management - mapWithState - a DStream
>> > transformation for stateful stream processing, supercedes
>> > updateStateByKey
>> > in functionality and performance.
>> > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been
>> > upgraded to use KCL 1.4.0 and supports transparent deaggregation of
>> > KPL-aggregated records.
>> > SPARK-10891 Kinesis message handler function - Allows arbitraray
>> > function to
>> > be applied to a Kinesis record in the Kinesis receiver before to
>> > customize
>> > what data is to be stored in memory.
>> > SPARK-6328  Python Streamng Listener API - Get streaming statistics
>> > (scheduling delays, batch processing times, etc.) in streaming.
>> >
>> > UI Improvements
>> >
>> > Made failures visible in the streaming tab, in the timelines, batch
>> > list,
>> > and batch details page.
>> > Made output operations visible in the streaming tab as progress bars.
>> >
>> > MLlib
>> >
>> > New algorithms/models
>> >
>> > SPARK-8518  Survival analysis - Log-linear model for survival analysis
>> > SPARK-9834  Normal equation for least squares - Normal equation solver,
>> > providing R-like model summary statistics
>> > SPARK-3147  Online hypothesis testing - A/B testing in the Spark
>> > Streaming
>> > framework
>> > SPARK-9930  New feature transformers - ChiSqSelector,
>> > QuantileDiscretizer,
>> > SQL transformer
>> > SPARK-6517  Bisecting K-Means clustering - Fast top-down clustering
>> > variant
>> > of K-Means
>> >
>> > API improvements
>> >
>> > ML Pipelines
>> >
>> > SPARK-6725  Pipeline persistence - Save/load for ML Pipelines, with
>> > partial
>> > coverage of spark.mlalgorithms
>> > SPARK-5565  LDA in ML Pipelines - API for Latent Dirichlet Allocation in
>> > ML
>> > Pipelines
>> >
>> > R API
>> >
>> > SPARK-9836  R-like statistics for GLMs - (Partial) R-like stats for
>> > ordinary
>> > least squares via summary(model)
>> > SPARK-9681  Feature interactions in R formula - Interaction operator ":"
>> > in
>> > R formula
>> >
>> > Python API - Many improvements to Python API to approach feature parity
>> >
>> > Misc improvements
>> >
>> > SPARK-7685 , SPARK-9642  Instance weights for GLMs - Logistic and Linear
>> > Regression can take instance weights
>> > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in
>> > DataFrames -
>> > Variance, stddev, correlations, etc.
>> > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source
>> >
>> > Documentation improvements
>> >
>> > SPARK-7751  @since versions - Documentation includes initial version
>> > when
>> > classes and methods were added
>> > SPARK-11337 Testable example code - Automated testing for code in user
>> > guide
>> > examples
>> >
>> > Deprecations
>> >
>> > In spark.mllib.clustering.KMeans, the "runs" parameter has been
>> > deprecated.
>> > In spark.ml.classification.LogisticRegressionModel and
>> > spark.ml.regression.LinearRegressionModel, the "weights" field has been
>> > deprecated, in favor of the new name "coefficients." This helps
>> > disambiguate
>> > from instance (row) weights given to algorithms.
>> >
>> > Changes of behavior
>> >
>> > spark.mllib.tree.GradientBoostedTrees validationTol has changed
>> > semantics in
>> > 1.6. Previously, it was a threshold for absolute change in error. Now,
>> > it
>> > resembles the behavior of GradientDescent convergenceTol: For large
>> > errors,
>> > it uses relative error (relative to the previous error); for small
>> > errors (<
>> > 0.01), it uses absolute error.
>> > spark.ml.feature.RegexTokenizer: Previously, it did not convert strings
>> > to
>> > lowercase before tokenizing. Now, it converts to lowercase by default,
>> > with
>> > an option not to. This matches the behavior of the simpler Tokenizer
>> > transformer.
>> > Spark SQL's partition discovery has been changed to only discover
>> > partition
>> > directories that are children of the given path. (i.e. if
>> > path="/my/data/x=1" then x=1 will no longer be considered a partition
>> > but
>> > only children of x=1.) This behavior can be overridden by manually
>> > specifying the basePath that partitioning discovery should start with
>> > (SPARK-11678).
>> > When casting a value of an integral type to timestamp (e.g. casting a
>> > long
>> > value to timestamp), the value is treated as being in seconds instead of
>> > milliseconds (SPARK-11724).
>> > With the improved query planner for queries having distinct aggregations
>> > (SPARK-9241), the plan of a query having a single distinct aggregation
>> > has
>> > been changed to a more robust version. To switch back to the plan
>> > generated
>> > by Spark 1.5's planner, please set
>> > spark.sql.specializeSingleDistinctAggPlanning to true (SPARK-12077).
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

Reply via email to