Doc readiness vs releasability

2015-12-12 Thread Sean Owen
-> new subject
I'll send my result of testing the RC separately

On Sat, Dec 12, 2015 at 11:58 PM, Michael Armbrust 
wrote:
>
> I'm only suggesting that we shouldn't delay testing of the actual bits, or
> wait to iterate on another RC.  Ideally docs should come out with the
> actual release announcement (and I'll do everything in my power to make
> this happen).  The should also be updated regularly as small issues are
> found.
>

Certainly there is no reason to stop testing, like nobody would stop if any
other issue were uncovered. You argued against a (non-binding) -1 on the
grounds that doc problems aren't that important to the release, because
they're published separately, possibly later. I don't think that can be the
policy in general, and this is not a trivial problem. Still, docs are
different, because not all project doc artifacts are released with code. I
think it's a valid factor to consider when judging, is this tarball close
enough for release? I'm guessing we all actually think that, so no problem.

Previously people said to me that JIRAs like "Foo docs for x.y" are not
necessarily intended to be done when x.y is released, because the site
might not actually get updated for a week, and can filter in shortly after.
That seemed unnecessarily seat-of-the-pants, and surely people don't mean
it's *supposed* to work that way? While there is
https://issues.apache.org/jira/browse/SPARK-11607 and about 4 unresolved
doc issues for 1.6, I suspect they're actually done or not very important,
so this is better than before.

I don't think people do mean this. It has been in each case a practical
argument, to not hold up the release further, and in all cases the release
was late.


>
> But if it can/will be fixed quickly, what's the hurry? I get it, people
>> want a releases sooner than later all else equal, but this is always true.
>> It'd be nice to talk about what behaviors have led to being behind schedule
>> and this perceived rush to finish now, since this same thing has happened
>> in 1.5, 1.4. I'd rather at least collect some opinions on it than
>> invalidate the question.
>>
>
> I'm happy to debate concrete process suggestions on another thread.
>

Nothing new here. I dislike feeling that pressure to finish by a date is
impacting judgments about whether it's close enough. But a fixed release
cycle is good. All the activities leading to this RC have looked pretty
good. (Score: still 28 issues targeted for 1.6, 11 bugs, no blockers). They
just seem to be happening several weeks late, which is not nothing in a
13-week cycle, and yes it's enough to cause people to ask "when will this
be ready?"

The formula is supposed to be: 2 months development, 2 weeks QA, 2 weeks of
finalizing from RCs, which seems right. 1.5 was about 2 weeks 'late' and in
practice pushed this cycle back, which means this one isn't far off, just
shifted. Closer.

Concretely: to make releases reliably on time without a rush, RCs should
start on time, which means burning down critical issues in QA from earlier,
which means not overrunning the development window, and means scoping the
right amount of work for ~2 months, and that requires earlier alignment on
what to (not) do, and this leads back to paying attention to plans already
made in JIRA. It's getting better and I think people find this a Good Idea,
but could still be better.

Whatever is to begin QA at the start of February: let's check in on the
status at the start of each week in January or something? a baby step to
making the idea of release schedule crop up earlier?

(I digress because I imagine this side thread be a non-issue if we were
here 3 weeks ago.)


Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
I'll kick off the voting with a +1.

On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc2
> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1169/
>
> The test repository (versioned as v1.6.0-rc2) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1168/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan performance when using flat schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even on
>shared clusters.
>- SPARK-   Dataset
>API - A type-safe API (similar to RDDs) that performs many operations
>on serialized binary data and code generation (i.e. Project Tungsten).
>- SPARK-1  Unified
>Memory Management - Shared memory for execution and caching instead of
>exclusive division of the regions.
>- SPARK-11197  SQL
>Queries on Files - Concise syntax for running SQL queries over files
>of any supported format without registering a table.
>- SPARK-11745  Reading
>non-standard JSON files - Added options to read non-standard JSON
>files (e.g. single-quotes, unquoted attributes)
>- SPARK-10412  
> Per-operator
>Metrics for SQL Execution - Display statistics on a peroperator basis
>for memory usage and spilled data size.
>- SPARK-11329  Star
>(*) expansion for StructTypes - Makes it easier to nest and unest
>arbitrary numbers of columns
>- SPARK-10917 ,
>SPARK-11149  In-memory
>Columnar Cache 

[VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version
1.6.0!

The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes if
a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.6.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is *v1.6.0-rc2
(23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
*

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1169/

The test repository (versioned as v1.6.0-rc2) for this release can be found
at:
https://repository.apache.org/content/repositories/orgapachespark-1168/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/

===
== How can I help test this release? ==
===
If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions.


== What justifies a -1 vote for this release? ==

This vote is happening towards the end of the 1.6 QA period, so -1 votes
should only occur for significant regressions from 1.5. Bugs already
present in 1.5, minor regressions, or bugs related to new features will not
block this release.

===
== What should happen to JIRA tickets still targeting 1.6.0? ==
===
1. It is OK for documentation patches to target 1.6.0 and still go into
branch-1.6, since documentations will be published separately from the
release.
2. New features for non-alpha-modules should target 1.7+.
3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
version.


==
== Major changes to help you focus your testing ==
==

Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming

   - SPARK-2629  
   trackStateByKey has been renamed to mapWithState

Spark SQL

   - SPARK-12165 
   SPARK-12189  Fix bugs
   in eviction of storage memory by execution.
   - SPARK-12258  correct
   passing null into ScalaUDF

Notable Features Since 1.5Spark SQL

   - SPARK-11787  Parquet
   Performance - Improve Parquet scan performance when using flat schemas.
   - SPARK-10810 
   Session Management - Isolated devault database (i.e USE mydb) even on
   shared clusters.
   - SPARK-   Dataset
   API - A type-safe API (similar to RDDs) that performs many operations on
   serialized binary data and code generation (i.e. Project Tungsten).
   - SPARK-1  Unified
   Memory Management - Shared memory for execution and caching instead of
   exclusive division of the regions.
   - SPARK-11197  SQL
   Queries on Files - Concise syntax for running SQL queries over files of
   any supported format without registering a table.
   - SPARK-11745  Reading
   non-standard JSON files - Added options to read non-standard JSON files
   (e.g. single-quotes, unquoted attributes)
   - SPARK-10412 
Per-operator
   Metrics for SQL Execution - Display statistics on a peroperator basis
   for memory usage and spilled data size.
   - SPARK-11329  Star
   (*) expansion for StructTypes - Makes it easier to nest and unest
   arbitrary numbers of columns
   - SPARK-10917 ,
   SPARK-11149  In-memory
   Columnar Cache Performance - Significant (up to 14x) speed up when
   caching data that contains complex types in DataFrames or SQL.
   - SPARK-1  Fast
   null-safe joins - Joins using null-safe equality (<=>) will now execute
   using SortMergeJoin instead of 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Benjamin Fradet
-1

For me the docs are not displaying except for the first page, for example
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/mllib-guide.html
is
a blank page.
This is because of SPARK-12199 
: Element[W|w]iseProductExample.scala is not the same in the docs and the
actual file name.

On Sat, Dec 12, 2015 at 6:39 PM, Michael Armbrust 
wrote:

> I'll kick off the voting with a +1.
>
> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.6.0!
>>
>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.6.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v1.6.0-rc2
>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>> *
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>
>> The test repository (versioned as v1.6.0-rc2) for this release can be
>> found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> should only occur for significant regressions from 1.5. Bugs already
>> present in 1.5, minor regressions, or bugs related to new features will not
>> block this release.
>>
>> ===
>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>> ===
>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>> branch-1.6, since documentations will be published separately from the
>> release.
>> 2. New features for non-alpha-modules should target 1.7+.
>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
>> version.
>>
>>
>> ==
>> == Major changes to help you focus your testing ==
>> ==
>>
>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>
>>- SPARK-2629  
>>trackStateByKey has been renamed to mapWithState
>>
>> Spark SQL
>>
>>- SPARK-12165 
>>SPARK-12189  Fix
>>bugs in eviction of storage memory by execution.
>>- SPARK-12258  correct
>>passing null into ScalaUDF
>>
>> Notable Features Since 1.5Spark SQL
>>
>>- SPARK-11787  Parquet
>>Performance - Improve Parquet scan performance when using flat
>>schemas.
>>- SPARK-10810 
>>Session Management - Isolated devault database (i.e USE mydb) even on
>>shared clusters.
>>- SPARK-   Dataset
>>API - A type-safe API (similar to RDDs) that performs many operations
>>on serialized binary data and code generation (i.e. Project Tungsten).
>>- SPARK-1  Unified
>>Memory Management - Shared memory for execution and caching instead
>>of exclusive division of the regions.
>>- SPARK-11197  SQL
>>Queries on Files - Concise syntax for running SQL queries over files
>>of any supported format without registering a table.
>>- SPARK-11745  Reading
>>non-standard JSON files - Added options to read non-standard JSON
>>files (e.g. single-quotes, unquoted attributes)
>>- 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
Thanks Ben, but as I said in the first email, docs are published separately
from the release, so this isn't a valid reason to down vote the RC.  We
just provide them to help with testing.

I'll ask the mllib guys to take a look at that patch though.
On Dec 12, 2015 9:44 AM, "Benjamin Fradet" 
wrote:

> -1
>
> For me the docs are not displaying except for the first page, for example
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/mllib-guide.html
>  is
> a blank page.
> This is because of SPARK-12199
> :
> Element[W|w]iseProductExample.scala is not the same in the docs and the
> actual file name.
>
> On Sat, Dec 12, 2015 at 6:39 PM, Michael Armbrust 
> wrote:
>
>> I'll kick off the voting with a +1.
>>
>> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust > > wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 1.6.0!
>>>
>>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
>>> passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.6.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is *v1.6.0-rc2
>>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>>> *
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>>
>>> The test repository (versioned as v1.6.0-rc2) for this release can be
>>> found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>>
>>> ===
>>> == How can I help test this release? ==
>>> ===
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> 
>>> == What justifies a -1 vote for this release? ==
>>> 
>>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>>> should only occur for significant regressions from 1.5. Bugs already
>>> present in 1.5, minor regressions, or bugs related to new features will not
>>> block this release.
>>>
>>> ===
>>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>>> ===
>>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>>> branch-1.6, since documentations will be published separately from the
>>> release.
>>> 2. New features for non-alpha-modules should target 1.7+.
>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>> target version.
>>>
>>>
>>> ==
>>> == Major changes to help you focus your testing ==
>>> ==
>>>
>>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>>
>>>- SPARK-2629  
>>>trackStateByKey has been renamed to mapWithState
>>>
>>> Spark SQL
>>>
>>>- SPARK-12165 
>>>SPARK-12189  Fix
>>>bugs in eviction of storage memory by execution.
>>>- SPARK-12258  correct
>>>passing null into ScalaUDF
>>>
>>> Notable Features Since 1.5Spark SQL
>>>
>>>- SPARK-11787  Parquet
>>>Performance - Improve Parquet scan performance when using flat
>>>schemas.
>>>- SPARK-10810 
>>>Session Management - Isolated devault database (i.e USE mydb) even
>>>on shared clusters.
>>>- SPARK-   Dataset
>>>API - A type-safe API (similar to RDDs) that performs many
>>>operations on serialized binary data and code generation (i.e. Project
>>>Tungsten).
>>>- SPARK-1  Unified
>>>Memory Management - Shared memory for execution and caching instead
>>>of exclusive 

RE: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Jean-Baptiste Onofré


+1 (non binding)
Tested with different samples.
RegardsJB 


Sent from my Samsung device

 Original message 
From: Michael Armbrust  
Date: 12/12/2015  18:39  (GMT+01:00) 
To: dev@spark.apache.org 
Subject: [VOTE] Release Apache Spark 1.6.0 (RC2) 

Please vote on releasing the following candidate as Apache Spark version 1.6.0!
The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.6.0[ ] -1 Do not release this 
package because ...
To learn more about Apache Spark, please see http://spark.apache.org/
The tag to be voted on is v1.6.0-rc2 (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
The release files, including signatures, digests, etc. can be found 
at:http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
Release artifacts are signed with the following 
key:https://people.apache.org/keys/committer/pwendell.asc
The staging repository for this release can be found 
at:https://repository.apache.org/content/repositories/orgapachespark-1169/
The test repository (versioned as v1.6.0-rc2) for this release can be found 
at:https://repository.apache.org/content/repositories/orgapachespark-1168/
The documentation corresponding to this release can be found 
at:http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
= How can I help test this release? 
=If you are a Spark user, you can help 
us test this release by taking an existing Spark workload and running on this 
release candidate, then reporting any regressions.
== What justifies a -1 vote for 
this release? ==This vote is 
happening towards the end of the 1.6 QA period, so -1 votes should only occur 
for significant regressions from 1.5. Bugs already present in 1.5, minor 
regressions, or bugs related to new features will not block this release.
= What should 
happen to JIRA tickets still targeting 1.6.0? 
=1. It is OK 
for documentation patches to target 1.6.0 and still go into branch-1.6, since 
documentations will be published separately from the release.2. New features 
for non-alpha-modules should target 1.7+.3. Non-blocker bug fixes should target 
1.6.1 or 1.7.0, or drop the target version.

 Major changes to help you 
focus your testing 
Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark StreamingSPARK-2629  
trackStateByKey has been renamed to mapWithStateSpark SQLSPARK-12165 
SPARK-12189 Fix bugs in eviction of storage memory by execution.SPARK-12258 
correct passing null into ScalaUDFNotable Features Since 1.5Spark 
SQLSPARK-11787 Parquet Performance - Improve Parquet scan performance when 
using flat schemas.SPARK-10810 Session Management - Isolated devault database 
(i.e USE mydb) even on shared clusters.SPARK-  Dataset API - A type-safe 
API (similar to RDDs) that performs many operations on serialized binary data 
and code generation (i.e. Project Tungsten).SPARK-1 Unified Memory 
Management - Shared memory for execution and caching instead of exclusive 
division of the regions.SPARK-11197 SQL Queries on Files - Concise syntax for 
running SQL queries over files of any supported format without registering a 
table.SPARK-11745 Reading non-standard JSON files - Added options to read 
non-standard JSON files (e.g. single-quotes, unquoted attributes)SPARK-10412 
Per-operator Metrics for SQL Execution - Display statistics on a peroperator 
basis for memory usage and spilled data size.SPARK-11329 Star (*) expansion for 
StructTypes - Makes it easier to nest and unest arbitrary numbers of 
columnsSPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - 
Significant (up to 14x) speed up when caching data that contains complex types 
in DataFrames or SQL.SPARK-1 Fast null-safe joins - Joins using null-safe 
equality (<=>) will now execute using SortMergeJoin instead of computing a 
cartisian product.SPARK-11389 SQL Execution Using Off-Heap Memory - Support for 
configuring query execution to occur using off-heap memory to avoid GC 
overheadSPARK-10978 Datasource API Avoid Double Filter - When implemeting a 
datasource with filter pushdown, developers can now tell Spark SQL to avoid 
double evaluating a pushed-down filter.SPARK-4849  Advanced Layout of Cached 
Data - storing partitioning and ordering schemes in In-memory table scan, and 
adding distributeBy and localSort to DF APISPARK-9858  Adaptive query execution 
- Intial support for automatically selecting the number of reducers for joins 
and aggregations.SPARK-9241  Improved query planner 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Yin Huai
+1

Critical and blocker issues of SQL have been addressed.

On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust 
wrote:

> I'll kick off the voting with a +1.
>
> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.6.0!
>>
>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.6.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v1.6.0-rc2
>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>> *
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>
>> The test repository (versioned as v1.6.0-rc2) for this release can be
>> found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> should only occur for significant regressions from 1.5. Bugs already
>> present in 1.5, minor regressions, or bugs related to new features will not
>> block this release.
>>
>> ===
>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>> ===
>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>> branch-1.6, since documentations will be published separately from the
>> release.
>> 2. New features for non-alpha-modules should target 1.7+.
>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
>> version.
>>
>>
>> ==
>> == Major changes to help you focus your testing ==
>> ==
>>
>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>
>>- SPARK-2629  
>>trackStateByKey has been renamed to mapWithState
>>
>> Spark SQL
>>
>>- SPARK-12165 
>>SPARK-12189  Fix
>>bugs in eviction of storage memory by execution.
>>- SPARK-12258  correct
>>passing null into ScalaUDF
>>
>> Notable Features Since 1.5Spark SQL
>>
>>- SPARK-11787  Parquet
>>Performance - Improve Parquet scan performance when using flat
>>schemas.
>>- SPARK-10810 
>>Session Management - Isolated devault database (i.e USE mydb) even on
>>shared clusters.
>>- SPARK-   Dataset
>>API - A type-safe API (similar to RDDs) that performs many operations
>>on serialized binary data and code generation (i.e. Project Tungsten).
>>- SPARK-1  Unified
>>Memory Management - Shared memory for execution and caching instead
>>of exclusive division of the regions.
>>- SPARK-11197  SQL
>>Queries on Files - Concise syntax for running SQL queries over files
>>of any supported format without registering a table.
>>- SPARK-11745  Reading
>>non-standard JSON files - Added options to read non-standard JSON
>>files (e.g. single-quotes, unquoted attributes)
>>- SPARK-10412  
>> Per-operator
>>Metrics for SQL Execution - Display statistics on a peroperator basis
>>for memory usage and spilled data size.
>>- SPARK-11329  Star
>>(*) expansion 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Mark Hamstra
+1

On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc2
> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1169/
>
> The test repository (versioned as v1.6.0-rc2) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1168/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan performance when using flat schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even on
>shared clusters.
>- SPARK-   Dataset
>API - A type-safe API (similar to RDDs) that performs many operations
>on serialized binary data and code generation (i.e. Project Tungsten).
>- SPARK-1  Unified
>Memory Management - Shared memory for execution and caching instead of
>exclusive division of the regions.
>- SPARK-11197  SQL
>Queries on Files - Concise syntax for running SQL queries over files
>of any supported format without registering a table.
>- SPARK-11745  Reading
>non-standard JSON files - Added options to read non-standard JSON
>files (e.g. single-quotes, unquoted attributes)
>- SPARK-10412  
> Per-operator
>Metrics for SQL Execution - Display statistics on a peroperator basis
>for memory usage and spilled data size.
>- SPARK-11329  Star
>(*) expansion for StructTypes - Makes it easier to nest and unest
>arbitrary numbers of columns
>- SPARK-10917 ,
>SPARK-11149  In-memory
>Columnar Cache Performance - Significant (up to 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Sean Owen
I've heard this argument before, but don't quite get it. Documentation is
part of a release, and I believe is something we're voting on here too, and
therefore needs to 'work' as documentation. We could not release this HTML
to the Apache site, so I think that does actually mean the artifacts
including docs don't work as a release.

Yes, I can see that the non-code artifacts can be released a little bit
after the code artifacts with last minute fixes. But, the whole release can
just happen later too. Why wouldn't this be a valid reason to block the
release?

On Sat, Dec 12, 2015 at 6:31 PM, Michael Armbrust 
wrote:

> Thanks Ben, but as I said in the first email, docs are published
> separately from the release, so this isn't a valid reason to down vote the
> RC.  We just provide them to help with testing.
>
> I'll ask the mllib guys to take a look at that patch though.
> On Dec 12, 2015 9:44 AM, "Benjamin Fradet" 
> wrote:
>
>> -1
>>
>> For me the docs are not displaying except for the first page, for example
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/mllib-guide.html
>>  is
>> a blank page.
>> This is because of SPARK-12199
>> :
>> Element[W|w]iseProductExample.scala is not the same in the docs and the
>> actual file name.
>>
>> On Sat, Dec 12, 2015 at 6:39 PM, Michael Armbrust > > wrote:
>>
>>> I'll kick off the voting with a +1.
>>>
>>> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 1.6.0!

 The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
 passes if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.6.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is *v1.6.0-rc2
 (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
 *

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1169/

 The test repository (versioned as v1.6.0-rc2) for this release can be
 found at:
 https://repository.apache.org/content/repositories/orgapachespark-1168/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/

 ===
 == How can I help test this release? ==
 ===
 If you are a Spark user, you can help us test this release by taking an
 existing Spark workload and running on this release candidate, then
 reporting any regressions.

 
 == What justifies a -1 vote for this release? ==
 
 This vote is happening towards the end of the 1.6 QA period, so -1
 votes should only occur for significant regressions from 1.5. Bugs already
 present in 1.5, minor regressions, or bugs related to new features will not
 block this release.

 ===
 == What should happen to JIRA tickets still targeting 1.6.0? ==
 ===
 1. It is OK for documentation patches to target 1.6.0 and still go into
 branch-1.6, since documentations will be published separately from the
 release.
 2. New features for non-alpha-modules should target 1.7+.
 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
 target version.


 ==
 == Major changes to help you focus your testing ==
 ==

 Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming

- SPARK-2629  
trackStateByKey has been renamed to mapWithState

 Spark SQL

- SPARK-12165 
SPARK-12189  Fix
bugs in eviction of storage memory by execution.
- SPARK-12258  
 correct
passing null into ScalaUDF

 Notable Features Since 1.5Spark SQL

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Sean Owen
(I can't -1 this.) I do agree that docs have been treated as if separate
from releases in the past. With more maturity in the release process, I'm
questioning that now, as I don't think it's normal. It would be a reason to
release or not release this particular tarball, so a vote thread is the
right place to discuss it.

I'm surprised you're suggesting there's not a coupling between a release's
code and the docs for that release. If a release happens and some time
later docs come out, that has some effect on people's usage. Surely, the
ideal is for docs for x.y to come from the bits for x.y, and thus are
available at the same time.

Reality is something else, and your argument is practical, that the release
is again behind and so shouldn't we overlook this minor problem to get it
out? This particular problem has to get fixed, soon, we agree. It's minor
by virtue of being hopefully temporary.

But if it can/will be fixed quickly, what's the hurry? I get it, people
want a releases sooner than later all else equal, but this is always true.
It'd be nice to talk about what behaviors have led to being behind schedule
and this perceived rush to finish now, since this same thing has happened
in 1.5, 1.4. I'd rather at least collect some opinions on it than
invalidate the question.

On Sat, Dec 12, 2015 at 11:17 PM, Michael Armbrust 
wrote:

> Sean, if you would like to -1 the release you are certainly entitled to,
> but in the past we have never held a release for documentation only
> issues.  If you'd like to change the policy of the project I'm not sure
> that a voting thread is the right place to do it.
>
> I think the right question here, is "How are users going to be affected by
> this temporary issue?".  Given that I'm pretty certain that no users build
> the documentation from the release themselves and instead consume it from
> the published documentation, the docs contained in the release seem less
> important as far as voting on the artifacts is concerned.
>
> In contrast, there have been several threads on the users list asking when
> the release is going to happen.  Should we make them wait longer for
> something that isn't going to affect their usage of the release?  I would
> vote no.  That doesn't mean that we shouldn't fix the documentation issue.
> It just means we shouldn't add unnecessary coupling where it has no benefit.
>
> On Sat, Dec 12, 2015 at 1:50 PM, Sean Owen  wrote:
>
>> I've heard this argument before, but don't quite get it. Documentation is
>> part of a release, and I believe is something we're voting on here too, and
>> therefore needs to 'work' as documentation. We could not release this HTML
>> to the Apache site, so I think that does actually mean the artifacts
>> including docs don't work as a release.
>>
>> Yes, I can see that the non-code artifacts can be released a little bit
>> after the code artifacts with last minute fixes. But, the whole release can
>> just happen later too. Why wouldn't this be a valid reason to block the
>> release?
>>
>> On Sat, Dec 12, 2015 at 6:31 PM, Michael Armbrust > > wrote:
>>
>>> Thanks Ben, but as I said in the first email, docs are published
>>> separately from the release, so this isn't a valid reason to down vote the
>>> RC.  We just provide them to help with testing.
>>>
>>> I'll ask the mllib guys to take a look at that patch though.
>>> On Dec 12, 2015 9:44 AM, "Benjamin Fradet" 
>>> wrote:
>>>
 -1

 For me the docs are not displaying except for the first page, for
 example
 http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/mllib-guide.html
  is
 a blank page.
 This is because of SPARK-12199
 :
 Element[W|w]iseProductExample.scala is not the same in the docs and
 the actual file name.

 On Sat, Dec 12, 2015 at 6:39 PM, Michael Armbrust <
 mich...@databricks.com> wrote:

> I'll kick off the voting with a +1.
>
> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust <
> mich...@databricks.com> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark
>> version 1.6.0!
>>
>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.6.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v1.6.0-rc2
>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>> *
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>
>> Release artifacts are signed 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
>
> I'm surprised you're suggesting there's not a coupling between a release's
> code and the docs for that release. If a release happens and some time
> later docs come out, that has some effect on people's usage.
>

I'm only suggesting that we shouldn't delay testing of the actual bits, or
wait to iterate on another RC.  Ideally docs should come out with the
actual release announcement (and I'll do everything in my power to make
this happen).  The should also be updated regularly as small issues are
found.

But if it can/will be fixed quickly, what's the hurry? I get it, people
> want a releases sooner than later all else equal, but this is always true.
> It'd be nice to talk about what behaviors have led to being behind schedule
> and this perceived rush to finish now, since this same thing has happened
> in 1.5, 1.4. I'd rather at least collect some opinions on it than
> invalidate the question.
>

I'm happy to debate concrete process suggestions on another thread.


Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
Sean, if you would like to -1 the release you are certainly entitled to,
but in the past we have never held a release for documentation only
issues.  If you'd like to change the policy of the project I'm not sure
that a voting thread is the right place to do it.

I think the right question here, is "How are users going to be affected by
this temporary issue?".  Given that I'm pretty certain that no users build
the documentation from the release themselves and instead consume it from
the published documentation, the docs contained in the release seem less
important as far as voting on the artifacts is concerned.

In contrast, there have been several threads on the users list asking when
the release is going to happen.  Should we make them wait longer for
something that isn't going to affect their usage of the release?  I would
vote no.  That doesn't mean that we shouldn't fix the documentation issue.
It just means we shouldn't add unnecessary coupling where it has no benefit.

On Sat, Dec 12, 2015 at 1:50 PM, Sean Owen  wrote:

> I've heard this argument before, but don't quite get it. Documentation is
> part of a release, and I believe is something we're voting on here too, and
> therefore needs to 'work' as documentation. We could not release this HTML
> to the Apache site, so I think that does actually mean the artifacts
> including docs don't work as a release.
>
> Yes, I can see that the non-code artifacts can be released a little bit
> after the code artifacts with last minute fixes. But, the whole release can
> just happen later too. Why wouldn't this be a valid reason to block the
> release?
>
> On Sat, Dec 12, 2015 at 6:31 PM, Michael Armbrust 
> wrote:
>
>> Thanks Ben, but as I said in the first email, docs are published
>> separately from the release, so this isn't a valid reason to down vote the
>> RC.  We just provide them to help with testing.
>>
>> I'll ask the mllib guys to take a look at that patch though.
>> On Dec 12, 2015 9:44 AM, "Benjamin Fradet" 
>> wrote:
>>
>>> -1
>>>
>>> For me the docs are not displaying except for the first page, for
>>> example
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/mllib-guide.html
>>>  is
>>> a blank page.
>>> This is because of SPARK-12199
>>> :
>>> Element[W|w]iseProductExample.scala is not the same in the docs and the
>>> actual file name.
>>>
>>> On Sat, Dec 12, 2015 at 6:39 PM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 I'll kick off the voting with a +1.

 On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust <
 mich...@databricks.com> wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 1.6.0!
>
> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc2
> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1169/
>
> The test repository (versioned as v1.6.0-rc2) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1168/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1
> votes should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will 
> not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
>