Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Yin Huai Sun, 06 Dec 2015 18:24:30 -0800

-1

Tow blocker bugs have been found after this RC.
https://issues.apache.org/jira/browse/SPARK-12089 can cause data corruption
when an external sorter spills data.
https://issues.apache.org/jira/browse/SPARK-12155 can prevent tasks from
acquiring memory even when the executor indeed can allocate memory by
evicting storage memory.


https://issues.apache.org/jira/browse/SPARK-12089 has been fixed. We are
still working on https://issues.apache.org/jira/browse/SPARK-12155.

On Fri, Dec 4, 2015 at 3:04 PM, Mark Hamstra <[email protected]>
wrote:

> 0
>
> Currently figuring out who is responsible for the regression that I am
> seeing in some user code ScalaUDFs that make use of Timestamps and where
> NULL from a CSV file read in via a TestHive#registerTestTable is now
> producing 1969-12-31 23:59:59.999999 instead of null.
>
> On Thu, Dec 3, 2015 at 1:57 PM, Sean Owen <[email protected]> wrote:
>
>> Licenses and signature are all fine.
>>
>> Docker integration tests consistently fail for me with Java 7 / Ubuntu
>> and "-Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver"
>>
>> *** RUN ABORTED ***
>>   java.lang.NoSuchMethodError:
>>
>> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
>>   at
>> org.glassfish.jersey.apache.connector.ApacheConnector.<init>(ApacheConnector.java:240)
>>   at
>> org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115)
>>   at
>> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418)
>>   at
>> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
>>   at
>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
>>   at
>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
>>   at
>> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
>>   at
>> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
>>   at
>> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
>>   at
>> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
>>
>> I also get this failure consistently:
>>
>> DirectKafkaStreamSuite
>> - offset recovery *** FAILED ***
>>   recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
>> Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
>>
>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
>>
>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
>>
>> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]))))
>> was false Recovered ranges are not the same as the ones generated
>> (DirectKafkaStreamSuite.scala:301)
>>
>> On Wed, Dec 2, 2015 at 8:26 PM, Michael Armbrust <[email protected]>
>> wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 1.6.0!
>> >
>> > The vote is open until Saturday, December 5, 2015 at 21:00 UTC and
>> passes if
>> > a majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 1.6.0
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v1.6.0-rc1
>> > (bf525845cef159d2d4c9f4d64e158f037179b5c4)
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/
>> >
>> > Release artifacts are signed with the following key:
>> > https://people.apache.org/keys/committer/pwendell.asc
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1165/
>> >
>> > The test repository (versioned as v1.6.0-rc1) for this release can be
>> found
>> > at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1164/
>> >
>> > The documentation corresponding to this release can be found at:
>> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc1-docs/
>> >
>> >
>> > =======================================
>> > == How can I help test this release? ==
>> > =======================================
>> > If you are a Spark user, you can help us test this release by taking an
>> > existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > ================================================
>> > == What justifies a -1 vote for this release? ==
>> > ================================================
>> > This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> > should only occur for significant regressions from 1.5. Bugs already
>> present
>> > in 1.5, minor regressions, or bugs related to new features will not
>> block
>> > this release.
>> >
>> > ===============================================================
>> > == What should happen to JIRA tickets still targeting 1.6.0? ==
>> > ===============================================================
>> > 1. It is OK for documentation patches to target 1.6.0 and still go into
>> > branch-1.6, since documentations will be published separately from the
>> > release.
>> > 2. New features for non-alpha-modules should target 1.7+.
>> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>> target
>> > version.
>> >
>> >
>> > ==================================================
>> > == Major changes to help you focus your testing ==
>> > ==================================================
>> >
>> > Spark SQL
>> >
>> > SPARK-10810 Session Management - The ability to create multiple
>> isolated SQL
>> > Contexts that have their own configuration and default database.  This
>> is
>> > turned on by default in the thrift server.
>> > SPARK-9999  Dataset API - A type-safe API (similar to RDDs) that
>> performs
>> > many operations on serialized binary data and code generation (i.e.
>> Project
>> > Tungsten).
>> > SPARK-10000 Unified Memory Management - Shared memory for execution and
>> > caching instead of exclusive division of the regions.
>> > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL
>> queries
>> > over files of any supported format without registering a table.
>> > SPARK-11745 Reading non-standard JSON files - Added options to read
>> > non-standard JSON files (e.g. single-quotes, unquoted attributes)
>> > SPARK-10412 Per-operator Metics for SQL Execution - Display statistics
>> on a
>> > per-operator basis for memory usage and spilled data size.
>> > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to
>> nest and
>> > unest arbitrary numbers of columns
>> > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance -
>> Significant
>> > (up to 14x) speed up when caching data that contains complex types in
>> > DataFrames or SQL.
>> > SPARK-11111 Fast null-safe joins - Joins using null-safe equality (<=>)
>> will
>> > now execute using SortMergeJoin instead of computing a cartisian
>> product.
>> > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for
>> configuring
>> > query execution to occur using off-heap memory to avoid GC overhead
>> > SPARK-10978 Datasource API Avoid Double Filter - When implementing a
>> > datasource with filter pushdown, developers can now tell Spark SQL to
>> avoid
>> > double evaluating a pushed-down filter.
>> > SPARK-4849  Advanced Layout of Cached Data - storing partitioning and
>> > ordering schemes in In-memory table scan, and adding distributeBy and
>> > localSort to DF API
>> > SPARK-9858  Adaptive query execution - Initial support for automatically
>> > selecting the number of reducers for joins and aggregations.
>> >
>> > Spark Streaming
>> >
>> > API Updates
>> >
>> > SPARK-2629  New improved state management - trackStateByKey - a DStream
>> > transformation for stateful stream processing, supersedes
>> updateStateByKey
>> > in functionality and performance.
>> > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been
>> > upgraded to use KCL 1.4.0 and supports transparent deaggregation of
>> > KPL-aggregated records.
>> > SPARK-10891 Kinesis message handler function - Allows arbitrary
>> function to
>> > be applied to a Kinesis record in the Kinesis receiver before to
>> customize
>> > what data is to be stored in memory.
>> > SPARK-6328  Python Streaming Listener API - Get streaming statistics
>> > (scheduling delays, batch processing times, etc.) in streaming.
>> >
>> > UI Improvements
>> >
>> > Made failures visible in the streaming tab, in the timelines, batch
>> list,
>> > and batch details page.
>> > Made output operations visible in the streaming tab as progress bars
>> >
>> > MLlib
>> >
>> > New algorithms/models
>> >
>> > SPARK-8518  Survival analysis - Log-linear model for survival analysis
>> > SPARK-9834  Normal equation for least squares - Normal equation solver,
>> > providing R-like model summary statistics
>> > SPARK-3147  Online hypothesis testing - A/B testing in the Spark
>> Streaming
>> > framework
>> > SPARK-9930  New feature transformers - ChiSqSelector,
>> QuantileDiscretizer,
>> > SQL transformer
>> > SPARK-6517  Bisecting K-Means clustering - Fast top-down clustering
>> variant
>> > of K-Means
>> >
>> > API improvements
>> >
>> > ML Pipelines
>> >
>> > SPARK-6725  Pipeline persistence - Save/load for ML Pipelines, with
>> partial
>> > coverage of spark.ml algorithms
>> > SPARK-5565  LDA in ML Pipelines - API for Latent Dirichlet Allocation
>> in ML
>> > Pipelines
>> >
>> > R API
>> >
>> > SPARK-9836  R-like statistics for GLMs - (Partial) R-like stats for
>> ordinary
>> > least squares via summary(model)
>> > SPARK-9681  Feature interactions in R formula - Interaction operator
>> ":" in
>> > R formula
>> >
>> > Python API - Many improvements to Python API to approach feature parity
>> >
>> > Misc improvements
>> >
>> > SPARK-7685 , SPARK-9642  Instance weights for GLMs - Logistic and Linear
>> > Regression can take instance weights
>> > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in
>> DataFrames -
>> > Variance, stddev, correlations, etc.
>> > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source
>> >
>> > Documentation improvements
>> >
>> > SPARK-7751  @since versions - Documentation includes initial version
>> when
>> > classes and methods were added
>> > SPARK-11337 Testable example code - Automated testing for code in user
>> guide
>> > examples
>> >
>> > Deprecations
>> >
>> > In spark.mllib.clustering.KMeans, the "runs" parameter has been
>> deprecated.
>> > In spark.ml.classification.LogisticRegressionModel and
>> > spark.ml.regression.LinearRegressionModel, the "weights" field has been
>> > deprecated, in favor of the new name "coefficients." This helps
>> disambiguate
>> > from instance (row) weights given to algorithms.
>> >
>> > Changes of behavior
>> >
>> > spark.mllib.tree.GradientBoostedTrees validationTol has changed
>> semantics in
>> > 1.6. Previously, it was a threshold for absolute change in error. Now,
>> it
>> > resembles the behavior of GradientDescent convergenceTol: For large
>> errors,
>> > it uses relative error (relative to the previous error); for small
>> errors (<
>> > 0.01), it uses absolute error.
>> > spark.ml.feature.RegexTokenizer: Previously, it did not convert strings
>> to
>> > lowercase before tokenizing. Now, it converts to lowercase by default,
>> with
>> > an option not to. This matches the behavior of the simpler Tokenizer
>> > transformer.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Reply via email to