Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Michael Armbrust Thu, 10 Dec 2015 18:47:40 -0800

Cutting RC2 now.

On Thu, Dec 10, 2015 at 12:59 PM, Michael Armbrust <mich...@databricks.com>
wrote:


> We are getting close to merging patches for SPARK-12155
> <https://issues.apache.org/jira/browse/SPARK-12155> and SPARK-12253
> <https://issues.apache.org/jira/browse/SPARK-12253>.  I'll be cutting RC2
> shortly after that.
>
> Michael
>
> On Tue, Dec 8, 2015 at 10:31 AM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> An update: the vote fails due to the -1.   I'll post another RC as soon
>> as we've resolved these issues.  In the mean time I encourage people to
>> continue testing and post any problems they encounter here.
>>
>> On Sun, Dec 6, 2015 at 6:24 PM, Yin Huai <yh...@databricks.com> wrote:
>>
>>> -1
>>>
>>> Tow blocker bugs have been found after this RC.
>>> https://issues.apache.org/jira/browse/SPARK-12089 can cause data
>>> corruption when an external sorter spills data.
>>> https://issues.apache.org/jira/browse/SPARK-12155 can prevent tasks
>>> from acquiring memory even when the executor indeed can allocate memory by
>>> evicting storage memory.
>>>
>>> https://issues.apache.org/jira/browse/SPARK-12089 has been fixed. We
>>> are still working on https://issues.apache.org/jira/browse/SPARK-12155.
>>>
>>> On Fri, Dec 4, 2015 at 3:04 PM, Mark Hamstra <m...@clearstorydata.com>
>>> wrote:
>>>
>>>> 0
>>>>
>>>> Currently figuring out who is responsible for the regression that I am
>>>> seeing in some user code ScalaUDFs that make use of Timestamps and where
>>>> NULL from a CSV file read in via a TestHive#registerTestTable is now
>>>> producing 1969-12-31 23:59:59.999999 instead of null.
>>>>
>>>> On Thu, Dec 3, 2015 at 1:57 PM, Sean Owen <so...@cloudera.com> wrote:
>>>>
>>>>> Licenses and signature are all fine.
>>>>>
>>>>> Docker integration tests consistently fail for me with Java 7 / Ubuntu
>>>>> and "-Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver"
>>>>>
>>>>> *** RUN ABORTED ***
>>>>>   java.lang.NoSuchMethodError:
>>>>>
>>>>> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
>>>>>   at
>>>>> org.glassfish.jersey.apache.connector.ApacheConnector.<init>(ApacheConnector.java:240)
>>>>>   at
>>>>> org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115)
>>>>>   at
>>>>> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418)
>>>>>   at
>>>>> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
>>>>>   at
>>>>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
>>>>>   at
>>>>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
>>>>>   at
>>>>> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
>>>>>   at
>>>>> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
>>>>>   at
>>>>> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
>>>>>   at
>>>>> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
>>>>>
>>>>> I also get this failure consistently:
>>>>>
>>>>> DirectKafkaStreamSuite
>>>>> - offset recovery *** FAILED ***
>>>>>   recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
>>>>> Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
>>>>>
>>>>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
>>>>>
>>>>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
>>>>>
>>>>> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]))))
>>>>> was false Recovered ranges are not the same as the ones generated
>>>>> (DirectKafkaStreamSuite.scala:301)
>>>>>
>>>>> On Wed, Dec 2, 2015 at 8:26 PM, Michael Armbrust <
>>>>> mich...@databricks.com> wrote:
>>>>> > Please vote on releasing the following candidate as Apache Spark
>>>>> version
>>>>> > 1.6.0!
>>>>> >
>>>>> > The vote is open until Saturday, December 5, 2015 at 21:00 UTC and
>>>>> passes if
>>>>> > a majority of at least 3 +1 PMC votes are cast.
>>>>> >
>>>>> > [ ] +1 Release this package as Apache Spark 1.6.0
>>>>> > [ ] -1 Do not release this package because ...
>>>>> >
>>>>> > To learn more about Apache Spark, please see
>>>>> http://spark.apache.org/
>>>>> >
>>>>> > The tag to be voted on is v1.6.0-rc1
>>>>> > (bf525845cef159d2d4c9f4d64e158f037179b5c4)
>>>>> >
>>>>> > The release files, including signatures, digests, etc. can be found
>>>>> at:
>>>>> >
>>>>> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/
>>>>> >
>>>>> > Release artifacts are signed with the following key:
>>>>> > https://people.apache.org/keys/committer/pwendell.asc
>>>>> >
>>>>> > The staging repository for this release can be found at:
>>>>> >
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1165/
>>>>> >
>>>>> > The test repository (versioned as v1.6.0-rc1) for this release can
>>>>> be found
>>>>> > at:
>>>>> >
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1164/
>>>>> >
>>>>> > The documentation corresponding to this release can be found at:
>>>>> >
>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc1-docs/
>>>>> >
>>>>> >
>>>>> > =======================================
>>>>> > == How can I help test this release? ==
>>>>> > =======================================
>>>>> > If you are a Spark user, you can help us test this release by taking
>>>>> an
>>>>> > existing Spark workload and running on this release candidate, then
>>>>> > reporting any regressions.
>>>>> >
>>>>> > ================================================
>>>>> > == What justifies a -1 vote for this release? ==
>>>>> > ================================================
>>>>> > This vote is happening towards the end of the 1.6 QA period, so -1
>>>>> votes
>>>>> > should only occur for significant regressions from 1.5. Bugs already
>>>>> present
>>>>> > in 1.5, minor regressions, or bugs related to new features will not
>>>>> block
>>>>> > this release.
>>>>> >
>>>>> > ===============================================================
>>>>> > == What should happen to JIRA tickets still targeting 1.6.0? ==
>>>>> > ===============================================================
>>>>> > 1. It is OK for documentation patches to target 1.6.0 and still go
>>>>> into
>>>>> > branch-1.6, since documentations will be published separately from
>>>>> the
>>>>> > release.
>>>>> > 2. New features for non-alpha-modules should target 1.7+.
>>>>> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>>>> target
>>>>> > version.
>>>>> >
>>>>> >
>>>>> > ==================================================
>>>>> > == Major changes to help you focus your testing ==
>>>>> > ==================================================
>>>>> >
>>>>> > Spark SQL
>>>>> >
>>>>> > SPARK-10810 Session Management - The ability to create multiple
>>>>> isolated SQL
>>>>> > Contexts that have their own configuration and default database.
>>>>> This is
>>>>> > turned on by default in the thrift server.
>>>>> > SPARK-9999  Dataset API - A type-safe API (similar to RDDs) that
>>>>> performs
>>>>> > many operations on serialized binary data and code generation (i.e.
>>>>> Project
>>>>> > Tungsten).
>>>>> > SPARK-10000 Unified Memory Management - Shared memory for execution
>>>>> and
>>>>> > caching instead of exclusive division of the regions.
>>>>> > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL
>>>>> queries
>>>>> > over files of any supported format without registering a table.
>>>>> > SPARK-11745 Reading non-standard JSON files - Added options to read
>>>>> > non-standard JSON files (e.g. single-quotes, unquoted attributes)
>>>>> > SPARK-10412 Per-operator Metics for SQL Execution - Display
>>>>> statistics on a
>>>>> > per-operator basis for memory usage and spilled data size.
>>>>> > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to
>>>>> nest and
>>>>> > unest arbitrary numbers of columns
>>>>> > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance -
>>>>> Significant
>>>>> > (up to 14x) speed up when caching data that contains complex types in
>>>>> > DataFrames or SQL.
>>>>> > SPARK-11111 Fast null-safe joins - Joins using null-safe equality
>>>>> (<=>) will
>>>>> > now execute using SortMergeJoin instead of computing a cartisian
>>>>> product.
>>>>> > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for
>>>>> configuring
>>>>> > query execution to occur using off-heap memory to avoid GC overhead
>>>>> > SPARK-10978 Datasource API Avoid Double Filter - When implementing a
>>>>> > datasource with filter pushdown, developers can now tell Spark SQL
>>>>> to avoid
>>>>> > double evaluating a pushed-down filter.
>>>>> > SPARK-4849  Advanced Layout of Cached Data - storing partitioning and
>>>>> > ordering schemes in In-memory table scan, and adding distributeBy and
>>>>> > localSort to DF API
>>>>> > SPARK-9858  Adaptive query execution - Initial support for
>>>>> automatically
>>>>> > selecting the number of reducers for joins and aggregations.
>>>>> >
>>>>> > Spark Streaming
>>>>> >
>>>>> > API Updates
>>>>> >
>>>>> > SPARK-2629  New improved state management - trackStateByKey - a
>>>>> DStream
>>>>> > transformation for stateful stream processing, supersedes
>>>>> updateStateByKey
>>>>> > in functionality and performance.
>>>>> > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been
>>>>> > upgraded to use KCL 1.4.0 and supports transparent deaggregation of
>>>>> > KPL-aggregated records.
>>>>> > SPARK-10891 Kinesis message handler function - Allows arbitrary
>>>>> function to
>>>>> > be applied to a Kinesis record in the Kinesis receiver before to
>>>>> customize
>>>>> > what data is to be stored in memory.
>>>>> > SPARK-6328  Python Streaming Listener API - Get streaming statistics
>>>>> > (scheduling delays, batch processing times, etc.) in streaming.
>>>>> >
>>>>> > UI Improvements
>>>>> >
>>>>> > Made failures visible in the streaming tab, in the timelines, batch
>>>>> list,
>>>>> > and batch details page.
>>>>> > Made output operations visible in the streaming tab as progress bars
>>>>> >
>>>>> > MLlib
>>>>> >
>>>>> > New algorithms/models
>>>>> >
>>>>> > SPARK-8518  Survival analysis - Log-linear model for survival
>>>>> analysis
>>>>> > SPARK-9834  Normal equation for least squares - Normal equation
>>>>> solver,
>>>>> > providing R-like model summary statistics
>>>>> > SPARK-3147  Online hypothesis testing - A/B testing in the Spark
>>>>> Streaming
>>>>> > framework
>>>>> > SPARK-9930  New feature transformers - ChiSqSelector,
>>>>> QuantileDiscretizer,
>>>>> > SQL transformer
>>>>> > SPARK-6517  Bisecting K-Means clustering - Fast top-down clustering
>>>>> variant
>>>>> > of K-Means
>>>>> >
>>>>> > API improvements
>>>>> >
>>>>> > ML Pipelines
>>>>> >
>>>>> > SPARK-6725  Pipeline persistence - Save/load for ML Pipelines, with
>>>>> partial
>>>>> > coverage of spark.ml algorithms
>>>>> > SPARK-5565  LDA in ML Pipelines - API for Latent Dirichlet
>>>>> Allocation in ML
>>>>> > Pipelines
>>>>> >
>>>>> > R API
>>>>> >
>>>>> > SPARK-9836  R-like statistics for GLMs - (Partial) R-like stats for
>>>>> ordinary
>>>>> > least squares via summary(model)
>>>>> > SPARK-9681  Feature interactions in R formula - Interaction operator
>>>>> ":" in
>>>>> > R formula
>>>>> >
>>>>> > Python API - Many improvements to Python API to approach feature
>>>>> parity
>>>>> >
>>>>> > Misc improvements
>>>>> >
>>>>> > SPARK-7685 , SPARK-9642  Instance weights for GLMs - Logistic and
>>>>> Linear
>>>>> > Regression can take instance weights
>>>>> > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in
>>>>> DataFrames -
>>>>> > Variance, stddev, correlations, etc.
>>>>> > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source
>>>>> >
>>>>> > Documentation improvements
>>>>> >
>>>>> > SPARK-7751  @since versions - Documentation includes initial version
>>>>> when
>>>>> > classes and methods were added
>>>>> > SPARK-11337 Testable example code - Automated testing for code in
>>>>> user guide
>>>>> > examples
>>>>> >
>>>>> > Deprecations
>>>>> >
>>>>> > In spark.mllib.clustering.KMeans, the "runs" parameter has been
>>>>> deprecated.
>>>>> > In spark.ml.classification.LogisticRegressionModel and
>>>>> > spark.ml.regression.LinearRegressionModel, the "weights" field has
>>>>> been
>>>>> > deprecated, in favor of the new name "coefficients." This helps
>>>>> disambiguate
>>>>> > from instance (row) weights given to algorithms.
>>>>> >
>>>>> > Changes of behavior
>>>>> >
>>>>> > spark.mllib.tree.GradientBoostedTrees validationTol has changed
>>>>> semantics in
>>>>> > 1.6. Previously, it was a threshold for absolute change in error.
>>>>> Now, it
>>>>> > resembles the behavior of GradientDescent convergenceTol: For large
>>>>> errors,
>>>>> > it uses relative error (relative to the previous error); for small
>>>>> errors (<
>>>>> > 0.01), it uses absolute error.
>>>>> > spark.ml.feature.RegexTokenizer: Previously, it did not convert
>>>>> strings to
>>>>> > lowercase before tokenizing. Now, it converts to lowercase by
>>>>> default, with
>>>>> > an option not to. This matches the behavior of the simpler Tokenizer
>>>>> > transformer.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Reply via email to