I've heard this argument before, but don't quite get it. Documentation is part of a release, and I believe is something we're voting on here too, and therefore needs to 'work' as documentation. We could not release this HTML to the Apache site, so I think that does actually mean the artifacts including docs don't work as a release.
Yes, I can see that the non-code artifacts can be released a little bit after the code artifacts with last minute fixes. But, the whole release can just happen later too. Why wouldn't this be a valid reason to block the release? On Sat, Dec 12, 2015 at 6:31 PM, Michael Armbrust <mich...@databricks.com> wrote: > Thanks Ben, but as I said in the first email, docs are published > separately from the release, so this isn't a valid reason to down vote the > RC. We just provide them to help with testing. > > I'll ask the mllib guys to take a look at that patch though. > On Dec 12, 2015 9:44 AM, "Benjamin Fradet" <benjamin.fra...@gmail.com> > wrote: > >> -1 >> >> For me the docs are not displaying except for the first page, for example >> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/mllib-guide.html >> is >> a blank page. >> This is because of SPARK-12199 >> <https://github.com/apache/spark/pull/10193>: >> Element[W|w]iseProductExample.scala is not the same in the docs and the >> actual file name. >> >> On Sat, Dec 12, 2015 at 6:39 PM, Michael Armbrust <mich...@databricks.com >> > wrote: >> >>> I'll kick off the voting with a +1. >>> >>> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust < >>> mich...@databricks.com> wrote: >>> >>>> Please vote on releasing the following candidate as Apache Spark >>>> version 1.6.0! >>>> >>>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and >>>> passes if a majority of at least 3 +1 PMC votes are cast. >>>> >>>> [ ] +1 Release this package as Apache Spark 1.6.0 >>>> [ ] -1 Do not release this package because ... >>>> >>>> To learn more about Apache Spark, please see http://spark.apache.org/ >>>> >>>> The tag to be voted on is *v1.6.0-rc2 >>>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b) >>>> <https://github.com/apache/spark/tree/v1.6.0-rc2>* >>>> >>>> The release files, including signatures, digests, etc. can be found at: >>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/ >>>> >>>> Release artifacts are signed with the following key: >>>> https://people.apache.org/keys/committer/pwendell.asc >>>> >>>> The staging repository for this release can be found at: >>>> https://repository.apache.org/content/repositories/orgapachespark-1169/ >>>> >>>> The test repository (versioned as v1.6.0-rc2) for this release can be >>>> found at: >>>> https://repository.apache.org/content/repositories/orgapachespark-1168/ >>>> >>>> The documentation corresponding to this release can be found at: >>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/ >>>> >>>> ======================================= >>>> == How can I help test this release? == >>>> ======================================= >>>> If you are a Spark user, you can help us test this release by taking an >>>> existing Spark workload and running on this release candidate, then >>>> reporting any regressions. >>>> >>>> ================================================ >>>> == What justifies a -1 vote for this release? == >>>> ================================================ >>>> This vote is happening towards the end of the 1.6 QA period, so -1 >>>> votes should only occur for significant regressions from 1.5. Bugs already >>>> present in 1.5, minor regressions, or bugs related to new features will not >>>> block this release. >>>> >>>> =============================================================== >>>> == What should happen to JIRA tickets still targeting 1.6.0? == >>>> =============================================================== >>>> 1. It is OK for documentation patches to target 1.6.0 and still go into >>>> branch-1.6, since documentations will be published separately from the >>>> release. >>>> 2. New features for non-alpha-modules should target 1.7+. >>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the >>>> target version. >>>> >>>> >>>> ================================================== >>>> == Major changes to help you focus your testing == >>>> ================================================== >>>> >>>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming >>>> >>>> - SPARK-2629 <https://issues.apache.org/jira/browse/SPARK-2629> >>>> trackStateByKey has been renamed to mapWithState >>>> >>>> Spark SQL >>>> >>>> - SPARK-12165 <https://issues.apache.org/jira/browse/SPARK-12165> >>>> SPARK-12189 <https://issues.apache.org/jira/browse/SPARK-12189> Fix >>>> bugs in eviction of storage memory by execution. >>>> - SPARK-12258 <https://issues.apache.org/jira/browse/SPARK-12258> >>>> correct >>>> passing null into ScalaUDF >>>> >>>> Notable Features Since 1.5Spark SQL >>>> >>>> - SPARK-11787 <https://issues.apache.org/jira/browse/SPARK-11787> >>>> Parquet >>>> Performance - Improve Parquet scan performance when using flat >>>> schemas. >>>> - SPARK-10810 <https://issues.apache.org/jira/browse/SPARK-10810> >>>> Session Management - Isolated devault database (i.e USE mydb) even >>>> on shared clusters. >>>> - SPARK-9999 <https://issues.apache.org/jira/browse/SPARK-9999> Dataset >>>> API - A type-safe API (similar to RDDs) that performs many >>>> operations on serialized binary data and code generation (i.e. Project >>>> Tungsten). >>>> - SPARK-10000 <https://issues.apache.org/jira/browse/SPARK-10000> >>>> Unified >>>> Memory Management - Shared memory for execution and caching instead >>>> of exclusive division of the regions. >>>> - SPARK-11197 <https://issues.apache.org/jira/browse/SPARK-11197> SQL >>>> Queries on Files - Concise syntax for running SQL queries over >>>> files of any supported format without registering a table. >>>> - SPARK-11745 <https://issues.apache.org/jira/browse/SPARK-11745> >>>> Reading >>>> non-standard JSON files - Added options to read non-standard JSON >>>> files (e.g. single-quotes, unquoted attributes) >>>> - SPARK-10412 <https://issues.apache.org/jira/browse/SPARK-10412> >>>> Per-operator >>>> Metrics for SQL Execution - Display statistics on a peroperator >>>> basis for memory usage and spilled data size. >>>> - SPARK-11329 <https://issues.apache.org/jira/browse/SPARK-11329> Star >>>> (*) expansion for StructTypes - Makes it easier to nest and unest >>>> arbitrary numbers of columns >>>> - SPARK-10917 <https://issues.apache.org/jira/browse/SPARK-10917>, >>>> SPARK-11149 <https://issues.apache.org/jira/browse/SPARK-11149> >>>> In-memory >>>> Columnar Cache Performance - Significant (up to 14x) speed up when >>>> caching data that contains complex types in DataFrames or SQL. >>>> - SPARK-11111 <https://issues.apache.org/jira/browse/SPARK-11111> Fast >>>> null-safe joins - Joins using null-safe equality (<=>) will now >>>> execute using SortMergeJoin instead of computing a cartisian product. >>>> - SPARK-11389 <https://issues.apache.org/jira/browse/SPARK-11389> SQL >>>> Execution Using Off-Heap Memory - Support for configuring query >>>> execution to occur using off-heap memory to avoid GC overhead >>>> - SPARK-10978 <https://issues.apache.org/jira/browse/SPARK-10978> >>>> Datasource >>>> API Avoid Double Filter - When implemeting a datasource with filter >>>> pushdown, developers can now tell Spark SQL to avoid double evaluating a >>>> pushed-down filter. >>>> - SPARK-4849 <https://issues.apache.org/jira/browse/SPARK-4849> >>>> Advanced >>>> Layout of Cached Data - storing partitioning and ordering schemes >>>> in In-memory table scan, and adding distributeBy and localSort to DF API >>>> - SPARK-9858 <https://issues.apache.org/jira/browse/SPARK-9858> >>>> Adaptive >>>> query execution - Intial support for automatically selecting the >>>> number of reducers for joins and aggregations. >>>> - SPARK-9241 <https://issues.apache.org/jira/browse/SPARK-9241> >>>> Improved >>>> query planner for queries having distinct aggregations - Query >>>> plans of distinct aggregations are more robust when distinct columns >>>> have >>>> high cardinality. >>>> >>>> Spark Streaming >>>> >>>> - API Updates >>>> - SPARK-2629 <https://issues.apache.org/jira/browse/SPARK-2629> New >>>> improved state management - mapWithState - a DStream >>>> transformation for stateful stream processing, supercedes >>>> updateStateByKey in functionality and performance. >>>> - SPARK-11198 <https://issues.apache.org/jira/browse/SPARK-11198> >>>> Kinesis record deaggregation - Kinesis streams have been >>>> upgraded to use KCL 1.4.0 and supports transparent deaggregation of >>>> KPL-aggregated records. >>>> - SPARK-10891 <https://issues.apache.org/jira/browse/SPARK-10891> >>>> Kinesis message handler function - Allows arbitraray function >>>> to be applied to a Kinesis record in the Kinesis receiver before to >>>> customize what data is to be stored in memory. >>>> - SPARK-6328 <https://issues.apache.org/jira/browse/SPARK-6328> >>>> Python >>>> Streamng Listener API - Get streaming statistics (scheduling >>>> delays, batch processing times, etc.) in streaming. >>>> >>>> >>>> - UI Improvements >>>> - Made failures visible in the streaming tab, in the timelines, >>>> batch list, and batch details page. >>>> - Made output operations visible in the streaming tab as >>>> progress bars. >>>> >>>> MLlibNew algorithms/models >>>> >>>> - SPARK-8518 <https://issues.apache.org/jira/browse/SPARK-8518> >>>> Survival >>>> analysis - Log-linear model for survival analysis >>>> - SPARK-9834 <https://issues.apache.org/jira/browse/SPARK-9834> Normal >>>> equation for least squares - Normal equation solver, providing >>>> R-like model summary statistics >>>> - SPARK-3147 <https://issues.apache.org/jira/browse/SPARK-3147> Online >>>> hypothesis testing - A/B testing in the Spark Streaming framework >>>> - SPARK-9930 <https://issues.apache.org/jira/browse/SPARK-9930> New >>>> feature transformers - ChiSqSelector, QuantileDiscretizer, SQL >>>> transformer >>>> - SPARK-6517 <https://issues.apache.org/jira/browse/SPARK-6517> >>>> Bisecting >>>> K-Means clustering - Fast top-down clustering variant of K-Means >>>> >>>> API improvements >>>> >>>> - ML Pipelines >>>> - SPARK-6725 <https://issues.apache.org/jira/browse/SPARK-6725> >>>> Pipeline >>>> persistence - Save/load for ML Pipelines, with partial coverage >>>> of spark.ml algorithms >>>> - SPARK-5565 <https://issues.apache.org/jira/browse/SPARK-5565> LDA >>>> in ML Pipelines - API for Latent Dirichlet Allocation in ML >>>> Pipelines >>>> - R API >>>> - SPARK-9836 <https://issues.apache.org/jira/browse/SPARK-9836> >>>> R-like >>>> statistics for GLMs - (Partial) R-like stats for ordinary least >>>> squares via summary(model) >>>> - SPARK-9681 <https://issues.apache.org/jira/browse/SPARK-9681> >>>> Feature >>>> interactions in R formula - Interaction operator ":" in R formula >>>> - Python API - Many improvements to Python API to approach feature >>>> parity >>>> >>>> Misc improvements >>>> >>>> - SPARK-7685 <https://issues.apache.org/jira/browse/SPARK-7685>, >>>> SPARK-9642 <https://issues.apache.org/jira/browse/SPARK-9642> Instance >>>> weights for GLMs - Logistic and Linear Regression can take instance >>>> weights >>>> - SPARK-10384 <https://issues.apache.org/jira/browse/SPARK-10384>, >>>> SPARK-10385 <https://issues.apache.org/jira/browse/SPARK-10385> >>>> Univariate >>>> and bivariate statistics in DataFrames - Variance, stddev, >>>> correlations, etc. >>>> - SPARK-10117 <https://issues.apache.org/jira/browse/SPARK-10117> LIBSVM >>>> data source - LIBSVM as a SQL data sourceDocumentation improvements >>>> - SPARK-7751 <https://issues.apache.org/jira/browse/SPARK-7751> @since >>>> versions - Documentation includes initial version when classes and >>>> methods were added >>>> - SPARK-11337 <https://issues.apache.org/jira/browse/SPARK-11337> >>>> Testable >>>> example code - Automated testing for code in user guide examples >>>> >>>> Deprecations >>>> >>>> - In spark.mllib.clustering.KMeans, the "runs" parameter has been >>>> deprecated. >>>> - In spark.ml.classification.LogisticRegressionModel and >>>> spark.ml.regression.LinearRegressionModel, the "weights" field has been >>>> deprecated, in favor of the new name "coefficients." This helps >>>> disambiguate from instance (row) weights given to algorithms. >>>> >>>> Changes of behavior >>>> >>>> - spark.mllib.tree.GradientBoostedTrees validationTol has changed >>>> semantics in 1.6. Previously, it was a threshold for absolute change in >>>> error. Now, it resembles the behavior of GradientDescent convergenceTol: >>>> For large errors, it uses relative error (relative to the previous >>>> error); >>>> for small errors (< 0.01), it uses absolute error. >>>> - spark.ml.feature.RegexTokenizer: Previously, it did not convert >>>> strings to lowercase before tokenizing. Now, it converts to lowercase by >>>> default, with an option not to. This matches the behavior of the simpler >>>> Tokenizer transformer. >>>> - Spark SQL's partition discovery has been changed to only discover >>>> partition directories that are children of the given path. (i.e. if >>>> path="/my/data/x=1" then x=1 will no longer be considered a >>>> partition but only children of x=1.) This behavior can be >>>> overridden by manually specifying the basePath that partitioning >>>> discovery should start with (SPARK-11678 >>>> <https://issues.apache.org/jira/browse/SPARK-11678>). >>>> - When casting a value of an integral type to timestamp (e.g. >>>> casting a long value to timestamp), the value is treated as being in >>>> seconds instead of milliseconds (SPARK-11724 >>>> <https://issues.apache.org/jira/browse/SPARK-11724>). >>>> - With the improved query planner for queries having distinct >>>> aggregations (SPARK-9241 >>>> <https://issues.apache.org/jira/browse/SPARK-9241>), the plan of a >>>> query having a single distinct aggregation has been changed to a more >>>> robust version. To switch back to the plan generated by Spark 1.5's >>>> planner, please set spark.sql.specializeSingleDistinctAggPlanning >>>> to true (SPARK-12077 >>>> <https://issues.apache.org/jira/browse/SPARK-12077>). >>>> >>>> >>>> >>> >> >> >> -- >> Ben Fradet. >> >