Cutting RC2 now. On Thu, Dec 10, 2015 at 12:59 PM, Michael Armbrust <mich...@databricks.com> wrote:
> We are getting close to merging patches for SPARK-12155 > <https://issues.apache.org/jira/browse/SPARK-12155> and SPARK-12253 > <https://issues.apache.org/jira/browse/SPARK-12253>. I'll be cutting RC2 > shortly after that. > > Michael > > On Tue, Dec 8, 2015 at 10:31 AM, Michael Armbrust <mich...@databricks.com> > wrote: > >> An update: the vote fails due to the -1. I'll post another RC as soon >> as we've resolved these issues. In the mean time I encourage people to >> continue testing and post any problems they encounter here. >> >> On Sun, Dec 6, 2015 at 6:24 PM, Yin Huai <yh...@databricks.com> wrote: >> >>> -1 >>> >>> Tow blocker bugs have been found after this RC. >>> https://issues.apache.org/jira/browse/SPARK-12089 can cause data >>> corruption when an external sorter spills data. >>> https://issues.apache.org/jira/browse/SPARK-12155 can prevent tasks >>> from acquiring memory even when the executor indeed can allocate memory by >>> evicting storage memory. >>> >>> https://issues.apache.org/jira/browse/SPARK-12089 has been fixed. We >>> are still working on https://issues.apache.org/jira/browse/SPARK-12155. >>> >>> On Fri, Dec 4, 2015 at 3:04 PM, Mark Hamstra <m...@clearstorydata.com> >>> wrote: >>> >>>> 0 >>>> >>>> Currently figuring out who is responsible for the regression that I am >>>> seeing in some user code ScalaUDFs that make use of Timestamps and where >>>> NULL from a CSV file read in via a TestHive#registerTestTable is now >>>> producing 1969-12-31 23:59:59.999999 instead of null. >>>> >>>> On Thu, Dec 3, 2015 at 1:57 PM, Sean Owen <so...@cloudera.com> wrote: >>>> >>>>> Licenses and signature are all fine. >>>>> >>>>> Docker integration tests consistently fail for me with Java 7 / Ubuntu >>>>> and "-Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver" >>>>> >>>>> *** RUN ABORTED *** >>>>> java.lang.NoSuchMethodError: >>>>> >>>>> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder; >>>>> at >>>>> org.glassfish.jersey.apache.connector.ApacheConnector.<init>(ApacheConnector.java:240) >>>>> at >>>>> org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115) >>>>> at >>>>> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418) >>>>> at >>>>> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88) >>>>> at >>>>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120) >>>>> at >>>>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117) >>>>> at >>>>> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340) >>>>> at >>>>> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726) >>>>> at >>>>> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285) >>>>> at >>>>> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126) >>>>> >>>>> I also get this failure consistently: >>>>> >>>>> DirectKafkaStreamSuite >>>>> - offset recovery *** FAILED *** >>>>> recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time, >>>>> Array[org.apache.spark.streaming.kafka.OffsetRange])) => >>>>> >>>>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time, >>>>> >>>>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1, >>>>> >>>>> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange])))) >>>>> was false Recovered ranges are not the same as the ones generated >>>>> (DirectKafkaStreamSuite.scala:301) >>>>> >>>>> On Wed, Dec 2, 2015 at 8:26 PM, Michael Armbrust < >>>>> mich...@databricks.com> wrote: >>>>> > Please vote on releasing the following candidate as Apache Spark >>>>> version >>>>> > 1.6.0! >>>>> > >>>>> > The vote is open until Saturday, December 5, 2015 at 21:00 UTC and >>>>> passes if >>>>> > a majority of at least 3 +1 PMC votes are cast. >>>>> > >>>>> > [ ] +1 Release this package as Apache Spark 1.6.0 >>>>> > [ ] -1 Do not release this package because ... >>>>> > >>>>> > To learn more about Apache Spark, please see >>>>> http://spark.apache.org/ >>>>> > >>>>> > The tag to be voted on is v1.6.0-rc1 >>>>> > (bf525845cef159d2d4c9f4d64e158f037179b5c4) >>>>> > >>>>> > The release files, including signatures, digests, etc. can be found >>>>> at: >>>>> > >>>>> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/ >>>>> > >>>>> > Release artifacts are signed with the following key: >>>>> > https://people.apache.org/keys/committer/pwendell.asc >>>>> > >>>>> > The staging repository for this release can be found at: >>>>> > >>>>> https://repository.apache.org/content/repositories/orgapachespark-1165/ >>>>> > >>>>> > The test repository (versioned as v1.6.0-rc1) for this release can >>>>> be found >>>>> > at: >>>>> > >>>>> https://repository.apache.org/content/repositories/orgapachespark-1164/ >>>>> > >>>>> > The documentation corresponding to this release can be found at: >>>>> > >>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc1-docs/ >>>>> > >>>>> > >>>>> > ======================================= >>>>> > == How can I help test this release? == >>>>> > ======================================= >>>>> > If you are a Spark user, you can help us test this release by taking >>>>> an >>>>> > existing Spark workload and running on this release candidate, then >>>>> > reporting any regressions. >>>>> > >>>>> > ================================================ >>>>> > == What justifies a -1 vote for this release? == >>>>> > ================================================ >>>>> > This vote is happening towards the end of the 1.6 QA period, so -1 >>>>> votes >>>>> > should only occur for significant regressions from 1.5. Bugs already >>>>> present >>>>> > in 1.5, minor regressions, or bugs related to new features will not >>>>> block >>>>> > this release. >>>>> > >>>>> > =============================================================== >>>>> > == What should happen to JIRA tickets still targeting 1.6.0? == >>>>> > =============================================================== >>>>> > 1. It is OK for documentation patches to target 1.6.0 and still go >>>>> into >>>>> > branch-1.6, since documentations will be published separately from >>>>> the >>>>> > release. >>>>> > 2. New features for non-alpha-modules should target 1.7+. >>>>> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the >>>>> target >>>>> > version. >>>>> > >>>>> > >>>>> > ================================================== >>>>> > == Major changes to help you focus your testing == >>>>> > ================================================== >>>>> > >>>>> > Spark SQL >>>>> > >>>>> > SPARK-10810 Session Management - The ability to create multiple >>>>> isolated SQL >>>>> > Contexts that have their own configuration and default database. >>>>> This is >>>>> > turned on by default in the thrift server. >>>>> > SPARK-9999 Dataset API - A type-safe API (similar to RDDs) that >>>>> performs >>>>> > many operations on serialized binary data and code generation (i.e. >>>>> Project >>>>> > Tungsten). >>>>> > SPARK-10000 Unified Memory Management - Shared memory for execution >>>>> and >>>>> > caching instead of exclusive division of the regions. >>>>> > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL >>>>> queries >>>>> > over files of any supported format without registering a table. >>>>> > SPARK-11745 Reading non-standard JSON files - Added options to read >>>>> > non-standard JSON files (e.g. single-quotes, unquoted attributes) >>>>> > SPARK-10412 Per-operator Metics for SQL Execution - Display >>>>> statistics on a >>>>> > per-operator basis for memory usage and spilled data size. >>>>> > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to >>>>> nest and >>>>> > unest arbitrary numbers of columns >>>>> > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - >>>>> Significant >>>>> > (up to 14x) speed up when caching data that contains complex types in >>>>> > DataFrames or SQL. >>>>> > SPARK-11111 Fast null-safe joins - Joins using null-safe equality >>>>> (<=>) will >>>>> > now execute using SortMergeJoin instead of computing a cartisian >>>>> product. >>>>> > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for >>>>> configuring >>>>> > query execution to occur using off-heap memory to avoid GC overhead >>>>> > SPARK-10978 Datasource API Avoid Double Filter - When implementing a >>>>> > datasource with filter pushdown, developers can now tell Spark SQL >>>>> to avoid >>>>> > double evaluating a pushed-down filter. >>>>> > SPARK-4849 Advanced Layout of Cached Data - storing partitioning and >>>>> > ordering schemes in In-memory table scan, and adding distributeBy and >>>>> > localSort to DF API >>>>> > SPARK-9858 Adaptive query execution - Initial support for >>>>> automatically >>>>> > selecting the number of reducers for joins and aggregations. >>>>> > >>>>> > Spark Streaming >>>>> > >>>>> > API Updates >>>>> > >>>>> > SPARK-2629 New improved state management - trackStateByKey - a >>>>> DStream >>>>> > transformation for stateful stream processing, supersedes >>>>> updateStateByKey >>>>> > in functionality and performance. >>>>> > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been >>>>> > upgraded to use KCL 1.4.0 and supports transparent deaggregation of >>>>> > KPL-aggregated records. >>>>> > SPARK-10891 Kinesis message handler function - Allows arbitrary >>>>> function to >>>>> > be applied to a Kinesis record in the Kinesis receiver before to >>>>> customize >>>>> > what data is to be stored in memory. >>>>> > SPARK-6328 Python Streaming Listener API - Get streaming statistics >>>>> > (scheduling delays, batch processing times, etc.) in streaming. >>>>> > >>>>> > UI Improvements >>>>> > >>>>> > Made failures visible in the streaming tab, in the timelines, batch >>>>> list, >>>>> > and batch details page. >>>>> > Made output operations visible in the streaming tab as progress bars >>>>> > >>>>> > MLlib >>>>> > >>>>> > New algorithms/models >>>>> > >>>>> > SPARK-8518 Survival analysis - Log-linear model for survival >>>>> analysis >>>>> > SPARK-9834 Normal equation for least squares - Normal equation >>>>> solver, >>>>> > providing R-like model summary statistics >>>>> > SPARK-3147 Online hypothesis testing - A/B testing in the Spark >>>>> Streaming >>>>> > framework >>>>> > SPARK-9930 New feature transformers - ChiSqSelector, >>>>> QuantileDiscretizer, >>>>> > SQL transformer >>>>> > SPARK-6517 Bisecting K-Means clustering - Fast top-down clustering >>>>> variant >>>>> > of K-Means >>>>> > >>>>> > API improvements >>>>> > >>>>> > ML Pipelines >>>>> > >>>>> > SPARK-6725 Pipeline persistence - Save/load for ML Pipelines, with >>>>> partial >>>>> > coverage of spark.ml algorithms >>>>> > SPARK-5565 LDA in ML Pipelines - API for Latent Dirichlet >>>>> Allocation in ML >>>>> > Pipelines >>>>> > >>>>> > R API >>>>> > >>>>> > SPARK-9836 R-like statistics for GLMs - (Partial) R-like stats for >>>>> ordinary >>>>> > least squares via summary(model) >>>>> > SPARK-9681 Feature interactions in R formula - Interaction operator >>>>> ":" in >>>>> > R formula >>>>> > >>>>> > Python API - Many improvements to Python API to approach feature >>>>> parity >>>>> > >>>>> > Misc improvements >>>>> > >>>>> > SPARK-7685 , SPARK-9642 Instance weights for GLMs - Logistic and >>>>> Linear >>>>> > Regression can take instance weights >>>>> > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in >>>>> DataFrames - >>>>> > Variance, stddev, correlations, etc. >>>>> > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source >>>>> > >>>>> > Documentation improvements >>>>> > >>>>> > SPARK-7751 @since versions - Documentation includes initial version >>>>> when >>>>> > classes and methods were added >>>>> > SPARK-11337 Testable example code - Automated testing for code in >>>>> user guide >>>>> > examples >>>>> > >>>>> > Deprecations >>>>> > >>>>> > In spark.mllib.clustering.KMeans, the "runs" parameter has been >>>>> deprecated. >>>>> > In spark.ml.classification.LogisticRegressionModel and >>>>> > spark.ml.regression.LinearRegressionModel, the "weights" field has >>>>> been >>>>> > deprecated, in favor of the new name "coefficients." This helps >>>>> disambiguate >>>>> > from instance (row) weights given to algorithms. >>>>> > >>>>> > Changes of behavior >>>>> > >>>>> > spark.mllib.tree.GradientBoostedTrees validationTol has changed >>>>> semantics in >>>>> > 1.6. Previously, it was a threshold for absolute change in error. >>>>> Now, it >>>>> > resembles the behavior of GradientDescent convergenceTol: For large >>>>> errors, >>>>> > it uses relative error (relative to the previous error); for small >>>>> errors (< >>>>> > 0.01), it uses absolute error. >>>>> > spark.ml.feature.RegexTokenizer: Previously, it did not convert >>>>> strings to >>>>> > lowercase before tokenizing. Now, it converts to lowercase by >>>>> default, with >>>>> > an option not to. This matches the behavior of the simpler Tokenizer >>>>> > transformer. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>>> >>>>> >>>> >>> >> >