Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

Mark Grover Fri, 18 Dec 2015 10:28:02 -0800

Thanks Sean for sending me the logs offline.

Turns out the tests are failing again, for reasons unrelated to Spark. I
have filed https://issues.apache.org/jira/browse/SPARK-12426 for that with
some details. In the meanwhile, I agree with Sean, these tests should be
disabled. And, again, I don't think this failures warrants blocking the
release.


Mark

On Fri, Dec 18, 2015 at 9:32 AM, Sean Owen <[email protected]> wrote:

> Yes that's what I mean. If they're not quite working, let's disable
> them, but first, we have to rule out that I'm not just missing some
> requirement.
>
> Functionally, it's not worth blocking the release. It seems like bad
> form to release with tests that always fail for a non-trivial number
> of users, but we have to establish that. If it's something with an
> easy fix (or needs disabling) and another RC needs to be baked, might
> be worth including.
>
> Logs coming offline
>
> On Fri, Dec 18, 2015 at 5:30 PM, Mark Grover <[email protected]> wrote:
> > Sean,
> > Are you referring to docker integration tests? If so, they were disabled
> for
> > majority of the release and I recently worked on it (SPARK-11796) and
> once
> > it got committed, the tests were re-enabled in Spark builds. I am not
> sure
> > what OSs the test builds use, but it should be passing there too.
> >
> > During my work, I tested on Ubuntu Precise and they worked. If you could
> > share the logs with me offline, I could take a look. Alternatively, I can
> > try to see if I can get Ubuntu 15 instance. However, given the history of
> > these tests, I personally don't think it makes sense to block the release
> > based on them not running on Ubuntu 15.
> >
> > On Fri, Dec 18, 2015 at 9:22 AM, Sean Owen <[email protected]> wrote:
> >>
> >> For me, mostly the same as before: tests are mostly passing, but I can
> >> never get the docker tests to pass. If anyone knows a special profile
> >> or package that needs to be enabled, I can try that and/or
> >> fix/document it. Just wondering if it's me.
> >>
> >> I'm on Java 7 + Ubuntu 15.10, with -Pyarn -Phive -Phive-thriftserver
> >> -Phadoop-2.6
> >>
> >> On Wed, Dec 16, 2015 at 9:32 PM, Michael Armbrust
> >> <[email protected]> wrote:
> >> > Please vote on releasing the following candidate as Apache Spark
> version
> >> > 1.6.0!
> >> >
> >> > The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
> >> > passes
> >> > if a majority of at least 3 +1 PMC votes are cast.
> >> >
> >> > [ ] +1 Release this package as Apache Spark 1.6.0
> >> > [ ] -1 Do not release this package because ...
> >> >
> >> > To learn more about Apache Spark, please see http://spark.apache.org/
> >> >
> >> > The tag to be voted on is v1.6.0-rc3
> >> > (168c89e07c51fa24b0bb88582c739cec0acb44d7)
> >> >
> >> > The release files, including signatures, digests, etc. can be found
> at:
> >> >
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
> >> >
> >> > Release artifacts are signed with the following key:
> >> > https://people.apache.org/keys/committer/pwendell.asc
> >> >
> >> > The staging repository for this release can be found at:
> >> >
> https://repository.apache.org/content/repositories/orgapachespark-1174/
> >> >
> >> > The test repository (versioned as v1.6.0-rc3) for this release can be
> >> > found
> >> > at:
> >> >
> https://repository.apache.org/content/repositories/orgapachespark-1173/
> >> >
> >> > The documentation corresponding to this release can be found at:
> >> >
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
> >> >
> >> > =======================================
> >> > == How can I help test this release? ==
> >> > =======================================
> >> > If you are a Spark user, you can help us test this release by taking
> an
> >> > existing Spark workload and running on this release candidate, then
> >> > reporting any regressions.
> >> >
> >> > ================================================
> >> > == What justifies a -1 vote for this release? ==
> >> > ================================================
> >> > This vote is happening towards the end of the 1.6 QA period, so -1
> votes
> >> > should only occur for significant regressions from 1.5. Bugs already
> >> > present
> >> > in 1.5, minor regressions, or bugs related to new features will not
> >> > block
> >> > this release.
> >> >
> >> > ===============================================================
> >> > == What should happen to JIRA tickets still targeting 1.6.0? ==
> >> > ===============================================================
> >> > 1. It is OK for documentation patches to target 1.6.0 and still go
> into
> >> > branch-1.6, since documentations will be published separately from the
> >> > release.
> >> > 2. New features for non-alpha-modules should target 1.7+.
> >> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
> >> > target
> >> > version.
> >> >
> >> >
> >> > ==================================================
> >> > == Major changes to help you focus your testing ==
> >> > ==================================================
> >> >
> >> > Notable changes since 1.6 RC2
> >> >
> >> >
> >> > - SPARK_VERSION has been set correctly
> >> > - SPARK-12199 ML Docs are publishing correctly
> >> > - SPARK-12345 Mesos cluster mode has been fixed
> >> >
> >> > Notable changes since 1.6 RC1
> >> >
> >> > Spark Streaming
> >> >
> >> > SPARK-2629  trackStateByKey has been renamed to mapWithState
> >> >
> >> > Spark SQL
> >> >
> >> > SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by
> >> > execution.
> >> > SPARK-12258 correct passing null into ScalaUDF
> >> >
> >> > Notable Features Since 1.5
> >> >
> >> > Spark SQL
> >> >
> >> > SPARK-11787 Parquet Performance - Improve Parquet scan performance
> when
> >> > using flat schemas.
> >> > SPARK-10810 Session Management - Isolated devault database (i.e USE
> >> > mydb)
> >> > even on shared clusters.
> >> > SPARK-9999  Dataset API - A type-safe API (similar to RDDs) that
> >> > performs
> >> > many operations on serialized binary data and code generation (i.e.
> >> > Project
> >> > Tungsten).
> >> > SPARK-10000 Unified Memory Management - Shared memory for execution
> and
> >> > caching instead of exclusive division of the regions.
> >> > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL
> >> > queries
> >> > over files of any supported format without registering a table.
> >> > SPARK-11745 Reading non-standard JSON files - Added options to read
> >> > non-standard JSON files (e.g. single-quotes, unquoted attributes)
> >> > SPARK-10412 Per-operator Metrics for SQL Execution - Display
> statistics
> >> > on a
> >> > peroperator basis for memory usage and spilled data size.
> >> > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to
> nest
> >> > and
> >> > unest arbitrary numbers of columns
> >> > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance -
> >> > Significant
> >> > (up to 14x) speed up when caching data that contains complex types in
> >> > DataFrames or SQL.
> >> > SPARK-11111 Fast null-safe joins - Joins using null-safe equality
> (<=>)
> >> > will
> >> > now execute using SortMergeJoin instead of computing a cartisian
> >> > product.
> >> > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for
> >> > configuring
> >> > query execution to occur using off-heap memory to avoid GC overhead
> >> > SPARK-10978 Datasource API Avoid Double Filter - When implemeting a
> >> > datasource with filter pushdown, developers can now tell Spark SQL to
> >> > avoid
> >> > double evaluating a pushed-down filter.
> >> > SPARK-4849  Advanced Layout of Cached Data - storing partitioning and
> >> > ordering schemes in In-memory table scan, and adding distributeBy and
> >> > localSort to DF API
> >> > SPARK-9858  Adaptive query execution - Intial support for
> automatically
> >> > selecting the number of reducers for joins and aggregations.
> >> > SPARK-9241  Improved query planner for queries having distinct
> >> > aggregations
> >> > - Query plans of distinct aggregations are more robust when distinct
> >> > columns
> >> > have high cardinality.
> >> >
> >> > Spark Streaming
> >> >
> >> > API Updates
> >> >
> >> > SPARK-2629  New improved state management - mapWithState - a DStream
> >> > transformation for stateful stream processing, supercedes
> >> > updateStateByKey
> >> > in functionality and performance.
> >> > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been
> >> > upgraded to use KCL 1.4.0 and supports transparent deaggregation of
> >> > KPL-aggregated records.
> >> > SPARK-10891 Kinesis message handler function - Allows arbitraray
> >> > function to
> >> > be applied to a Kinesis record in the Kinesis receiver before to
> >> > customize
> >> > what data is to be stored in memory.
> >> > SPARK-6328  Python Streamng Listener API - Get streaming statistics
> >> > (scheduling delays, batch processing times, etc.) in streaming.
> >> >
> >> > UI Improvements
> >> >
> >> > Made failures visible in the streaming tab, in the timelines, batch
> >> > list,
> >> > and batch details page.
> >> > Made output operations visible in the streaming tab as progress bars.
> >> >
> >> > MLlib
> >> >
> >> > New algorithms/models
> >> >
> >> > SPARK-8518  Survival analysis - Log-linear model for survival analysis
> >> > SPARK-9834  Normal equation for least squares - Normal equation
> solver,
> >> > providing R-like model summary statistics
> >> > SPARK-3147  Online hypothesis testing - A/B testing in the Spark
> >> > Streaming
> >> > framework
> >> > SPARK-9930  New feature transformers - ChiSqSelector,
> >> > QuantileDiscretizer,
> >> > SQL transformer
> >> > SPARK-6517  Bisecting K-Means clustering - Fast top-down clustering
> >> > variant
> >> > of K-Means
> >> >
> >> > API improvements
> >> >
> >> > ML Pipelines
> >> >
> >> > SPARK-6725  Pipeline persistence - Save/load for ML Pipelines, with
> >> > partial
> >> > coverage of spark.mlalgorithms
> >> > SPARK-5565  LDA in ML Pipelines - API for Latent Dirichlet Allocation
> in
> >> > ML
> >> > Pipelines
> >> >
> >> > R API
> >> >
> >> > SPARK-9836  R-like statistics for GLMs - (Partial) R-like stats for
> >> > ordinary
> >> > least squares via summary(model)
> >> > SPARK-9681  Feature interactions in R formula - Interaction operator
> ":"
> >> > in
> >> > R formula
> >> >
> >> > Python API - Many improvements to Python API to approach feature
> parity
> >> >
> >> > Misc improvements
> >> >
> >> > SPARK-7685 , SPARK-9642  Instance weights for GLMs - Logistic and
> Linear
> >> > Regression can take instance weights
> >> > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in
> >> > DataFrames -
> >> > Variance, stddev, correlations, etc.
> >> > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source
> >> >
> >> > Documentation improvements
> >> >
> >> > SPARK-7751  @since versions - Documentation includes initial version
> >> > when
> >> > classes and methods were added
> >> > SPARK-11337 Testable example code - Automated testing for code in user
> >> > guide
> >> > examples
> >> >
> >> > Deprecations
> >> >
> >> > In spark.mllib.clustering.KMeans, the "runs" parameter has been
> >> > deprecated.
> >> > In spark.ml.classification.LogisticRegressionModel and
> >> > spark.ml.regression.LinearRegressionModel, the "weights" field has
> been
> >> > deprecated, in favor of the new name "coefficients." This helps
> >> > disambiguate
> >> > from instance (row) weights given to algorithms.
> >> >
> >> > Changes of behavior
> >> >
> >> > spark.mllib.tree.GradientBoostedTrees validationTol has changed
> >> > semantics in
> >> > 1.6. Previously, it was a threshold for absolute change in error. Now,
> >> > it
> >> > resembles the behavior of GradientDescent convergenceTol: For large
> >> > errors,
> >> > it uses relative error (relative to the previous error); for small
> >> > errors (<
> >> > 0.01), it uses absolute error.
> >> > spark.ml.feature.RegexTokenizer: Previously, it did not convert
> strings
> >> > to
> >> > lowercase before tokenizing. Now, it converts to lowercase by default,
> >> > with
> >> > an option not to. This matches the behavior of the simpler Tokenizer
> >> > transformer.
> >> > Spark SQL's partition discovery has been changed to only discover
> >> > partition
> >> > directories that are children of the given path. (i.e. if
> >> > path="/my/data/x=1" then x=1 will no longer be considered a partition
> >> > but
> >> > only children of x=1.) This behavior can be overridden by manually
> >> > specifying the basePath that partitioning discovery should start with
> >> > (SPARK-11678).
> >> > When casting a value of an integral type to timestamp (e.g. casting a
> >> > long
> >> > value to timestamp), the value is treated as being in seconds instead
> of
> >> > milliseconds (SPARK-11724).
> >> > With the improved query planner for queries having distinct
> aggregations
> >> > (SPARK-9241), the plan of a query having a single distinct aggregation
> >> > has
> >> > been changed to a more robust version. To switch back to the plan
> >> > generated
> >> > by Spark 1.5's planner, please set
> >> > spark.sql.specializeSingleDistinctAggPlanning to true (SPARK-12077).
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

Reply via email to