Thanks Sean for sending me the logs offline. Turns out the tests are failing again, for reasons unrelated to Spark. I have filed https://issues.apache.org/jira/browse/SPARK-12426 for that with some details. In the meanwhile, I agree with Sean, these tests should be disabled. And, again, I don't think this failures warrants blocking the release.
Mark On Fri, Dec 18, 2015 at 9:32 AM, Sean Owen <so...@cloudera.com> wrote: > Yes that's what I mean. If they're not quite working, let's disable > them, but first, we have to rule out that I'm not just missing some > requirement. > > Functionally, it's not worth blocking the release. It seems like bad > form to release with tests that always fail for a non-trivial number > of users, but we have to establish that. If it's something with an > easy fix (or needs disabling) and another RC needs to be baked, might > be worth including. > > Logs coming offline > > On Fri, Dec 18, 2015 at 5:30 PM, Mark Grover <m...@apache.org> wrote: > > Sean, > > Are you referring to docker integration tests? If so, they were disabled > for > > majority of the release and I recently worked on it (SPARK-11796) and > once > > it got committed, the tests were re-enabled in Spark builds. I am not > sure > > what OSs the test builds use, but it should be passing there too. > > > > During my work, I tested on Ubuntu Precise and they worked. If you could > > share the logs with me offline, I could take a look. Alternatively, I can > > try to see if I can get Ubuntu 15 instance. However, given the history of > > these tests, I personally don't think it makes sense to block the release > > based on them not running on Ubuntu 15. > > > > On Fri, Dec 18, 2015 at 9:22 AM, Sean Owen <so...@cloudera.com> wrote: > >> > >> For me, mostly the same as before: tests are mostly passing, but I can > >> never get the docker tests to pass. If anyone knows a special profile > >> or package that needs to be enabled, I can try that and/or > >> fix/document it. Just wondering if it's me. > >> > >> I'm on Java 7 + Ubuntu 15.10, with -Pyarn -Phive -Phive-thriftserver > >> -Phadoop-2.6 > >> > >> On Wed, Dec 16, 2015 at 9:32 PM, Michael Armbrust > >> <mich...@databricks.com> wrote: > >> > Please vote on releasing the following candidate as Apache Spark > version > >> > 1.6.0! > >> > > >> > The vote is open until Saturday, December 19, 2015 at 18:00 UTC and > >> > passes > >> > if a majority of at least 3 +1 PMC votes are cast. > >> > > >> > [ ] +1 Release this package as Apache Spark 1.6.0 > >> > [ ] -1 Do not release this package because ... > >> > > >> > To learn more about Apache Spark, please see http://spark.apache.org/ > >> > > >> > The tag to be voted on is v1.6.0-rc3 > >> > (168c89e07c51fa24b0bb88582c739cec0acb44d7) > >> > > >> > The release files, including signatures, digests, etc. can be found > at: > >> > > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/ > >> > > >> > Release artifacts are signed with the following key: > >> > https://people.apache.org/keys/committer/pwendell.asc > >> > > >> > The staging repository for this release can be found at: > >> > > https://repository.apache.org/content/repositories/orgapachespark-1174/ > >> > > >> > The test repository (versioned as v1.6.0-rc3) for this release can be > >> > found > >> > at: > >> > > https://repository.apache.org/content/repositories/orgapachespark-1173/ > >> > > >> > The documentation corresponding to this release can be found at: > >> > > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/ > >> > > >> > ======================================= > >> > == How can I help test this release? == > >> > ======================================= > >> > If you are a Spark user, you can help us test this release by taking > an > >> > existing Spark workload and running on this release candidate, then > >> > reporting any regressions. > >> > > >> > ================================================ > >> > == What justifies a -1 vote for this release? == > >> > ================================================ > >> > This vote is happening towards the end of the 1.6 QA period, so -1 > votes > >> > should only occur for significant regressions from 1.5. Bugs already > >> > present > >> > in 1.5, minor regressions, or bugs related to new features will not > >> > block > >> > this release. > >> > > >> > =============================================================== > >> > == What should happen to JIRA tickets still targeting 1.6.0? == > >> > =============================================================== > >> > 1. It is OK for documentation patches to target 1.6.0 and still go > into > >> > branch-1.6, since documentations will be published separately from the > >> > release. > >> > 2. New features for non-alpha-modules should target 1.7+. > >> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the > >> > target > >> > version. > >> > > >> > > >> > ================================================== > >> > == Major changes to help you focus your testing == > >> > ================================================== > >> > > >> > Notable changes since 1.6 RC2 > >> > > >> > > >> > - SPARK_VERSION has been set correctly > >> > - SPARK-12199 ML Docs are publishing correctly > >> > - SPARK-12345 Mesos cluster mode has been fixed > >> > > >> > Notable changes since 1.6 RC1 > >> > > >> > Spark Streaming > >> > > >> > SPARK-2629 trackStateByKey has been renamed to mapWithState > >> > > >> > Spark SQL > >> > > >> > SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by > >> > execution. > >> > SPARK-12258 correct passing null into ScalaUDF > >> > > >> > Notable Features Since 1.5 > >> > > >> > Spark SQL > >> > > >> > SPARK-11787 Parquet Performance - Improve Parquet scan performance > when > >> > using flat schemas. > >> > SPARK-10810 Session Management - Isolated devault database (i.e USE > >> > mydb) > >> > even on shared clusters. > >> > SPARK-9999 Dataset API - A type-safe API (similar to RDDs) that > >> > performs > >> > many operations on serialized binary data and code generation (i.e. > >> > Project > >> > Tungsten). > >> > SPARK-10000 Unified Memory Management - Shared memory for execution > and > >> > caching instead of exclusive division of the regions. > >> > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL > >> > queries > >> > over files of any supported format without registering a table. > >> > SPARK-11745 Reading non-standard JSON files - Added options to read > >> > non-standard JSON files (e.g. single-quotes, unquoted attributes) > >> > SPARK-10412 Per-operator Metrics for SQL Execution - Display > statistics > >> > on a > >> > peroperator basis for memory usage and spilled data size. > >> > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to > nest > >> > and > >> > unest arbitrary numbers of columns > >> > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - > >> > Significant > >> > (up to 14x) speed up when caching data that contains complex types in > >> > DataFrames or SQL. > >> > SPARK-11111 Fast null-safe joins - Joins using null-safe equality > (<=>) > >> > will > >> > now execute using SortMergeJoin instead of computing a cartisian > >> > product. > >> > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for > >> > configuring > >> > query execution to occur using off-heap memory to avoid GC overhead > >> > SPARK-10978 Datasource API Avoid Double Filter - When implemeting a > >> > datasource with filter pushdown, developers can now tell Spark SQL to > >> > avoid > >> > double evaluating a pushed-down filter. > >> > SPARK-4849 Advanced Layout of Cached Data - storing partitioning and > >> > ordering schemes in In-memory table scan, and adding distributeBy and > >> > localSort to DF API > >> > SPARK-9858 Adaptive query execution - Intial support for > automatically > >> > selecting the number of reducers for joins and aggregations. > >> > SPARK-9241 Improved query planner for queries having distinct > >> > aggregations > >> > - Query plans of distinct aggregations are more robust when distinct > >> > columns > >> > have high cardinality. > >> > > >> > Spark Streaming > >> > > >> > API Updates > >> > > >> > SPARK-2629 New improved state management - mapWithState - a DStream > >> > transformation for stateful stream processing, supercedes > >> > updateStateByKey > >> > in functionality and performance. > >> > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been > >> > upgraded to use KCL 1.4.0 and supports transparent deaggregation of > >> > KPL-aggregated records. > >> > SPARK-10891 Kinesis message handler function - Allows arbitraray > >> > function to > >> > be applied to a Kinesis record in the Kinesis receiver before to > >> > customize > >> > what data is to be stored in memory. > >> > SPARK-6328 Python Streamng Listener API - Get streaming statistics > >> > (scheduling delays, batch processing times, etc.) in streaming. > >> > > >> > UI Improvements > >> > > >> > Made failures visible in the streaming tab, in the timelines, batch > >> > list, > >> > and batch details page. > >> > Made output operations visible in the streaming tab as progress bars. > >> > > >> > MLlib > >> > > >> > New algorithms/models > >> > > >> > SPARK-8518 Survival analysis - Log-linear model for survival analysis > >> > SPARK-9834 Normal equation for least squares - Normal equation > solver, > >> > providing R-like model summary statistics > >> > SPARK-3147 Online hypothesis testing - A/B testing in the Spark > >> > Streaming > >> > framework > >> > SPARK-9930 New feature transformers - ChiSqSelector, > >> > QuantileDiscretizer, > >> > SQL transformer > >> > SPARK-6517 Bisecting K-Means clustering - Fast top-down clustering > >> > variant > >> > of K-Means > >> > > >> > API improvements > >> > > >> > ML Pipelines > >> > > >> > SPARK-6725 Pipeline persistence - Save/load for ML Pipelines, with > >> > partial > >> > coverage of spark.mlalgorithms > >> > SPARK-5565 LDA in ML Pipelines - API for Latent Dirichlet Allocation > in > >> > ML > >> > Pipelines > >> > > >> > R API > >> > > >> > SPARK-9836 R-like statistics for GLMs - (Partial) R-like stats for > >> > ordinary > >> > least squares via summary(model) > >> > SPARK-9681 Feature interactions in R formula - Interaction operator > ":" > >> > in > >> > R formula > >> > > >> > Python API - Many improvements to Python API to approach feature > parity > >> > > >> > Misc improvements > >> > > >> > SPARK-7685 , SPARK-9642 Instance weights for GLMs - Logistic and > Linear > >> > Regression can take instance weights > >> > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in > >> > DataFrames - > >> > Variance, stddev, correlations, etc. > >> > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source > >> > > >> > Documentation improvements > >> > > >> > SPARK-7751 @since versions - Documentation includes initial version > >> > when > >> > classes and methods were added > >> > SPARK-11337 Testable example code - Automated testing for code in user > >> > guide > >> > examples > >> > > >> > Deprecations > >> > > >> > In spark.mllib.clustering.KMeans, the "runs" parameter has been > >> > deprecated. > >> > In spark.ml.classification.LogisticRegressionModel and > >> > spark.ml.regression.LinearRegressionModel, the "weights" field has > been > >> > deprecated, in favor of the new name "coefficients." This helps > >> > disambiguate > >> > from instance (row) weights given to algorithms. > >> > > >> > Changes of behavior > >> > > >> > spark.mllib.tree.GradientBoostedTrees validationTol has changed > >> > semantics in > >> > 1.6. Previously, it was a threshold for absolute change in error. Now, > >> > it > >> > resembles the behavior of GradientDescent convergenceTol: For large > >> > errors, > >> > it uses relative error (relative to the previous error); for small > >> > errors (< > >> > 0.01), it uses absolute error. > >> > spark.ml.feature.RegexTokenizer: Previously, it did not convert > strings > >> > to > >> > lowercase before tokenizing. Now, it converts to lowercase by default, > >> > with > >> > an option not to. This matches the behavior of the simpler Tokenizer > >> > transformer. > >> > Spark SQL's partition discovery has been changed to only discover > >> > partition > >> > directories that are children of the given path. (i.e. if > >> > path="/my/data/x=1" then x=1 will no longer be considered a partition > >> > but > >> > only children of x=1.) This behavior can be overridden by manually > >> > specifying the basePath that partitioning discovery should start with > >> > (SPARK-11678). > >> > When casting a value of an integral type to timestamp (e.g. casting a > >> > long > >> > value to timestamp), the value is treated as being in seconds instead > of > >> > milliseconds (SPARK-11724). > >> > With the improved query planner for queries having distinct > aggregations > >> > (SPARK-9241), the plan of a query having a single distinct aggregation > >> > has > >> > been changed to a more robust version. To switch back to the plan > >> > generated > >> > by Spark 1.5's planner, please set > >> > spark.sql.specializeSingleDistinctAggPlanning to true (SPARK-12077). > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > >> For additional commands, e-mail: dev-h...@spark.apache.org > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >