Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-15 Thread Iulian Dragoș
Thanks for the heads up.

On Tue, Dec 15, 2015 at 11:40 PM, Michael Armbrust 
wrote:

> This vote is canceled due to the issue with the incorrect version.  This
> issue will be fixed by https://github.com/apache/spark/pull/10317
>
> We can wait a little bit for a fix to
> https://issues.apache.org/jira/browse/SPARK-12345.  However if it looks
> like there is not an easy fix coming soon, I'm planning to move forward
> with RC3.
>
> On Mon, Dec 14, 2015 at 9:31 PM, Mark Hamstra 
> wrote:
>
>> I'm afraid you're correct, Krishna:
>>
>> core/src/main/scala/org/apache/spark/package.scala:  val SPARK_VERSION =
>> "1.6.0-SNAPSHOT"
>> docs/_config.yml:SPARK_VERSION: 1.6.0-SNAPSHOT
>>
>> On Mon, Dec 14, 2015 at 6:51 PM, Krishna Sankar 
>> wrote:
>>
>>> Guys,
>>>The sc.version gives 1.6.0-SNAPSHOT. Need to change to 1.6.0. Can you
>>> pl verify ?
>>> Cheers
>>> 
>>>
>>> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 1.6.0!

 The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
 passes if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.6.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is *v1.6.0-rc2
 (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
 *

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1169/

 The test repository (versioned as v1.6.0-rc2) for this release can be
 found at:
 https://repository.apache.org/content/repositories/orgapachespark-1168/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/

 ===
 == How can I help test this release? ==
 ===
 If you are a Spark user, you can help us test this release by taking an
 existing Spark workload and running on this release candidate, then
 reporting any regressions.

 
 == What justifies a -1 vote for this release? ==
 
 This vote is happening towards the end of the 1.6 QA period, so -1
 votes should only occur for significant regressions from 1.5. Bugs already
 present in 1.5, minor regressions, or bugs related to new features will not
 block this release.

 ===
 == What should happen to JIRA tickets still targeting 1.6.0? ==
 ===
 1. It is OK for documentation patches to target 1.6.0 and still go into
 branch-1.6, since documentations will be published separately from the
 release.
 2. New features for non-alpha-modules should target 1.7+.
 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
 target version.


 ==
 == Major changes to help you focus your testing ==
 ==

 Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming

- SPARK-2629  
trackStateByKey has been renamed to mapWithState

 Spark SQL

- SPARK-12165 
SPARK-12189  Fix
bugs in eviction of storage memory by execution.
- SPARK-12258  
 correct
passing null into ScalaUDF

 Notable Features Since 1.5Spark SQL

- SPARK-11787  
 Parquet
Performance - Improve Parquet scan performance when using flat
schemas.
- SPARK-10810 
Session Management - Isolated devault database (i.e USE mydb) even
on shared clusters.
- SPARK-   Dataset
API - A type-safe API (similar to RDDs) that performs many
operations on serialized binary data and code generation (i.e. Project
Tungsten).
- SPARK-1 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-15 Thread Michael Armbrust
This vote is canceled due to the issue with the incorrect version.  This
issue will be fixed by https://github.com/apache/spark/pull/10317

We can wait a little bit for a fix to
https://issues.apache.org/jira/browse/SPARK-12345.  However if it looks
like there is not an easy fix coming soon, I'm planning to move forward
with RC3.

On Mon, Dec 14, 2015 at 9:31 PM, Mark Hamstra 
wrote:

> I'm afraid you're correct, Krishna:
>
> core/src/main/scala/org/apache/spark/package.scala:  val SPARK_VERSION =
> "1.6.0-SNAPSHOT"
> docs/_config.yml:SPARK_VERSION: 1.6.0-SNAPSHOT
>
> On Mon, Dec 14, 2015 at 6:51 PM, Krishna Sankar 
> wrote:
>
>> Guys,
>>The sc.version gives 1.6.0-SNAPSHOT. Need to change to 1.6.0. Can you
>> pl verify ?
>> Cheers
>> 
>>
>> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust > > wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 1.6.0!
>>>
>>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
>>> passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.6.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is *v1.6.0-rc2
>>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>>> *
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>>
>>> The test repository (versioned as v1.6.0-rc2) for this release can be
>>> found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>>
>>> ===
>>> == How can I help test this release? ==
>>> ===
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> 
>>> == What justifies a -1 vote for this release? ==
>>> 
>>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>>> should only occur for significant regressions from 1.5. Bugs already
>>> present in 1.5, minor regressions, or bugs related to new features will not
>>> block this release.
>>>
>>> ===
>>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>>> ===
>>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>>> branch-1.6, since documentations will be published separately from the
>>> release.
>>> 2. New features for non-alpha-modules should target 1.7+.
>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>> target version.
>>>
>>>
>>> ==
>>> == Major changes to help you focus your testing ==
>>> ==
>>>
>>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>>
>>>- SPARK-2629  
>>>trackStateByKey has been renamed to mapWithState
>>>
>>> Spark SQL
>>>
>>>- SPARK-12165 
>>>SPARK-12189  Fix
>>>bugs in eviction of storage memory by execution.
>>>- SPARK-12258  correct
>>>passing null into ScalaUDF
>>>
>>> Notable Features Since 1.5Spark SQL
>>>
>>>- SPARK-11787  Parquet
>>>Performance - Improve Parquet scan performance when using flat
>>>schemas.
>>>- SPARK-10810 
>>>Session Management - Isolated devault database (i.e USE mydb) even
>>>on shared clusters.
>>>- SPARK-   Dataset
>>>API - A type-safe API (similar to RDDs) that performs many
>>>operations on serialized binary data and code generation (i.e. Project
>>>Tungsten).
>>>- SPARK-1  Unified
>>>Memory Management - Shared memory for execution and caching instead
>>>of exclusive division of the regions.
>>>- SPARK-11197  SQL
>>>  

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-15 Thread Iulian Dragoș
-1 (non-binding)

Cluster mode on Mesos is broken (regression compared to 1.5.2). It seems to
be related to the way SPARK_HOME is handled. In the driver logs I see:

I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0
I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave
130bdc39-44e7-4256-8c22-602040d337f1-S1
bin/spark-submit: line 27:
/Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class:
No such file or directory

The path is my local SPARK_HOME, but that’s of course not the one in the
Mesos slave.

iulian

On Tue, Dec 15, 2015 at 6:31 AM, Mark Hamstra 
wrote:

I'm afraid you're correct, Krishna:
>
> core/src/main/scala/org/apache/spark/package.scala:  val SPARK_VERSION =
> "1.6.0-SNAPSHOT"
> docs/_config.yml:SPARK_VERSION: 1.6.0-SNAPSHOT
>
> On Mon, Dec 14, 2015 at 6:51 PM, Krishna Sankar 
> wrote:
>
>> Guys,
>>The sc.version gives 1.6.0-SNAPSHOT. Need to change to 1.6.0. Can you
>> pl verify ?
>> Cheers
>> 
>>
>> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust > > wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 1.6.0!
>>>
>>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
>>> passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.6.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is *v1.6.0-rc2
>>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>>> *
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>>
>>> The test repository (versioned as v1.6.0-rc2) for this release can be
>>> found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>>
>>> ===
>>> == How can I help test this release? ==
>>> ===
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> 
>>> == What justifies a -1 vote for this release? ==
>>> 
>>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>>> should only occur for significant regressions from 1.5. Bugs already
>>> present in 1.5, minor regressions, or bugs related to new features will not
>>> block this release.
>>>
>>> ===
>>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>>> ===
>>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>>> branch-1.6, since documentations will be published separately from the
>>> release.
>>> 2. New features for non-alpha-modules should target 1.7+.
>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>> target version.
>>>
>>>
>>> ==
>>> == Major changes to help you focus your testing ==
>>> ==
>>>
>>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>>
>>>- SPARK-2629  
>>>trackStateByKey has been renamed to mapWithState
>>>
>>> Spark SQL
>>>
>>>- SPARK-12165 
>>>SPARK-12189  Fix
>>>bugs in eviction of storage memory by execution.
>>>- SPARK-12258  correct
>>>passing null into ScalaUDF
>>>
>>> Notable Features Since 1.5Spark SQL
>>>
>>>- SPARK-11787  Parquet
>>>Performance - Improve Parquet scan performance when using flat
>>>schemas.
>>>- SPARK-10810 
>>>Session Management - Isolated devault database (i.e USE mydb) even
>>>on shared clusters.
>>>- SPARK-   Dataset
>>>API - A type-safe API (similar to RDDs) that performs many
>>>operations on serialized binary data and code generation (i.e. Project
>>>Tungsten).
>>>- SPARK-1 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-14 Thread Mark Hamstra
I'm afraid you're correct, Krishna:

core/src/main/scala/org/apache/spark/package.scala:  val SPARK_VERSION =
"1.6.0-SNAPSHOT"
docs/_config.yml:SPARK_VERSION: 1.6.0-SNAPSHOT

On Mon, Dec 14, 2015 at 6:51 PM, Krishna Sankar  wrote:

> Guys,
>The sc.version gives 1.6.0-SNAPSHOT. Need to change to 1.6.0. Can you
> pl verify ?
> Cheers
> 
>
> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.6.0!
>>
>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.6.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v1.6.0-rc2
>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>> *
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>
>> The test repository (versioned as v1.6.0-rc2) for this release can be
>> found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> should only occur for significant regressions from 1.5. Bugs already
>> present in 1.5, minor regressions, or bugs related to new features will not
>> block this release.
>>
>> ===
>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>> ===
>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>> branch-1.6, since documentations will be published separately from the
>> release.
>> 2. New features for non-alpha-modules should target 1.7+.
>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
>> version.
>>
>>
>> ==
>> == Major changes to help you focus your testing ==
>> ==
>>
>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>
>>- SPARK-2629  
>>trackStateByKey has been renamed to mapWithState
>>
>> Spark SQL
>>
>>- SPARK-12165 
>>SPARK-12189  Fix
>>bugs in eviction of storage memory by execution.
>>- SPARK-12258  correct
>>passing null into ScalaUDF
>>
>> Notable Features Since 1.5Spark SQL
>>
>>- SPARK-11787  Parquet
>>Performance - Improve Parquet scan performance when using flat
>>schemas.
>>- SPARK-10810 
>>Session Management - Isolated devault database (i.e USE mydb) even on
>>shared clusters.
>>- SPARK-   Dataset
>>API - A type-safe API (similar to RDDs) that performs many operations
>>on serialized binary data and code generation (i.e. Project Tungsten).
>>- SPARK-1  Unified
>>Memory Management - Shared memory for execution and caching instead
>>of exclusive division of the regions.
>>- SPARK-11197  SQL
>>Queries on Files - Concise syntax for running SQL queries over files
>>of any supported format without registering a table.
>>- SPARK-11745  Reading
>>non-standard JSON files - Added options to read non-standard JSON
>>files (e.g. single-quotes, unquoted attributes)
>>- SPARK-10412  
>> Per-operator
>>Metrics for SQL Execution - Display statistics on a peroperator basis
>>fo

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-14 Thread Krishna Sankar
Guys,
   The sc.version gives 1.6.0-SNAPSHOT. Need to change to 1.6.0. Can you pl
verify ?
Cheers


On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc2
> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1169/
>
> The test repository (versioned as v1.6.0-rc2) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1168/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan performance when using flat schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even on
>shared clusters.
>- SPARK-   Dataset
>API - A type-safe API (similar to RDDs) that performs many operations
>on serialized binary data and code generation (i.e. Project Tungsten).
>- SPARK-1  Unified
>Memory Management - Shared memory for execution and caching instead of
>exclusive division of the regions.
>- SPARK-11197  SQL
>Queries on Files - Concise syntax for running SQL queries over files
>of any supported format without registering a table.
>- SPARK-11745  Reading
>non-standard JSON files - Added options to read non-standard JSON
>files (e.g. single-quotes, unquoted attributes)
>- SPARK-10412  
> Per-operator
>Metrics for SQL Execution - Display statistics on a peroperator basis
>for memory usage and spilled data size.
>- SPARK-11329  Star
>(*) expansion for StructTypes - Makes it easier to nest and unest
>arbitrary numbers of columns
>- SPARK-10917 ,
>SPARK-11149 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-14 Thread Kousuke Saruta

+1 (non-binding)

Tested some workloads using basic API and DataFrame API on my 4-nodes 
YARN cluster (1 master and 3 slaves).

I also tested the Web UI.

(I'm resending this mail just in case because it seems that I failed to 
send the mail to dev@)

On 2015/12/13 2:39, Michael Armbrust wrote:
Please vote on releasing the following candidate as Apache Spark 
version 1.6.0!


The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and 
passes if a majority of at least 3 +1 PMC votes are cast.


[ ] +1 Release this package as Apache Spark 1.6.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is _v1.6.0-rc2 
(23f8dfd45187cb8f2216328ab907ddb5fbdffd0b) 
_


The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/ 



Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1169/

The test repository (versioned as v1.6.0-rc2) for this release can be 
found at:

https://repository.apache.org/content/repositories/orgapachespark-1168/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/ 


===
== How can I help test this release? ==
===
If you are a Spark user, you can help us test this release by taking 
an existing Spark workload and running on this release candidate, then 
reporting any regressions.



== What justifies a -1 vote for this release? ==

This vote is happening towards the end of the 1.6 QA period, so -1 
votes should only occur for significant regressions from 1.5. Bugs 
already present in 1.5, minor regressions, or bugs related to new 
features will not block this release.


===
== What should happen to JIRA tickets still targeting 1.6.0? ==
===
1. It is OK for documentation patches to target 1.6.0 and still go 
into branch-1.6, since documentations will be published separately 
from the release.

2. New features for non-alpha-modules should target 1.7+.
3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the 
target version.



==
== Major changes to help you focus your testing ==
==


  Spark 1.6.0 Preview


Notable changes since 1.6 RC1


  Spark Streaming

  * SPARK-2629 
|trackStateByKey| has been renamed to |mapWithState|


  Spark SQL

  * SPARK-12165 
SPARK-12189
 Fix bugs in
eviction of storage memory by execution.
  * SPARK-12258
 correct
passing null into ScalaUDF


Notable Features Since 1.5


  Spark SQL

  * SPARK-11787 
Parquet Performance - Improve Parquet scan performance when using
flat schemas.
  * SPARK-10810
Session
Management - Isolated devault database (i.e |USE mydb|) even on
shared clusters.
  * SPARK- 
Dataset API - A type-safe API (similar to RDDs) that performs many
operations on serialized binary data and code generation (i.e.
Project Tungsten).
  * SPARK-1 
Unified Memory Management - Shared memory for execution and
caching instead of exclusive division of the regions.
  * SPARK-11197 
SQL Queries on Files - Concise syntax for running SQL queries over
files of any supported format without registering a table.
  * SPARK-11745 
Reading non-standard JSON files - Added options to read
non-standard JSON files (e.g. single-quotes, unquoted attributes)
  * SPARK-10412 
Per-operator Metrics for SQL Execution - Display statistics on a
peroperator basis for memory usage and spilled data size.
  * SPARK-11329 
Star (*) expansion for StructType

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-14 Thread Andrew Or
+1

Ran PageRank on standalone mode with 4 nodes and noticed a speedup after
the specific commits that were in RC2 but not RC1:

c247b6a Dec 10 [SPARK-12155][SPARK-12253] Fix executor OOM in unified
memory management
05e441e Dec 9 [SPARK-12165][SPARK-12189] Fix bugs in eviction of storage
memory by execution

Also jobs that triggered these issues now run successfully.


2015-12-14 10:45 GMT-08:00 Reynold Xin :

> +1
>
> Tested some dataframe operations on my Mac.
>
>
> On Saturday, December 12, 2015, Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.6.0!
>>
>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.6.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v1.6.0-rc2
>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>> *
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>
>> The test repository (versioned as v1.6.0-rc2) for this release can be
>> found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> should only occur for significant regressions from 1.5. Bugs already
>> present in 1.5, minor regressions, or bugs related to new features will not
>> block this release.
>>
>> ===
>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>> ===
>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>> branch-1.6, since documentations will be published separately from the
>> release.
>> 2. New features for non-alpha-modules should target 1.7+.
>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
>> version.
>>
>>
>> ==
>> == Major changes to help you focus your testing ==
>> ==
>>
>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>
>>- SPARK-2629  
>>trackStateByKey has been renamed to mapWithState
>>
>> Spark SQL
>>
>>- SPARK-12165 
>>SPARK-12189  Fix
>>bugs in eviction of storage memory by execution.
>>- SPARK-12258  correct
>>passing null into ScalaUDF
>>
>> Notable Features Since 1.5Spark SQL
>>
>>- SPARK-11787  Parquet
>>Performance - Improve Parquet scan performance when using flat
>>schemas.
>>- SPARK-10810 
>>Session Management - Isolated devault database (i.e USE mydb) even on
>>shared clusters.
>>- SPARK-   Dataset
>>API - A type-safe API (similar to RDDs) that performs many operations
>>on serialized binary data and code generation (i.e. Project Tungsten).
>>- SPARK-1  Unified
>>Memory Management - Shared memory for execution and caching instead
>>of exclusive division of the regions.
>>- SPARK-11197  SQL
>>Queries on Files - Concise syntax for running SQL queries over files
>>of any supported format without registering a table.
>>- SPARK-11745  Reading
>>non-standard JSON files - Added options to read non-standard JSON
>>files (e.g. single-quotes, unquoted attributes)
>>- SPARK-10412 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-14 Thread Reynold Xin
+1

Tested some dataframe operations on my Mac.

On Saturday, December 12, 2015, Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc2
> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1169/
>
> The test repository (versioned as v1.6.0-rc2) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1168/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan performance when using flat schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even on
>shared clusters.
>- SPARK-   Dataset
>API - A type-safe API (similar to RDDs) that performs many operations
>on serialized binary data and code generation (i.e. Project Tungsten).
>- SPARK-1  Unified
>Memory Management - Shared memory for execution and caching instead of
>exclusive division of the regions.
>- SPARK-11197  SQL
>Queries on Files - Concise syntax for running SQL queries over files
>of any supported format without registering a table.
>- SPARK-11745  Reading
>non-standard JSON files - Added options to read non-standard JSON
>files (e.g. single-quotes, unquoted attributes)
>- SPARK-10412  
> Per-operator
>Metrics for SQL Execution - Display statistics on a peroperator basis
>for memory usage and spilled data size.
>- SPARK-11329  Star
>(*) expansion for StructTypes - Makes it easier to nest and unest
>arbitrary numbers of columns
>- SPARK-10917 ,
>SPARK-11149  In-memory
>Columnar Cache Performance -

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-14 Thread Michael Armbrust
Here are a fixed version of the docs for 1.6:
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docsfixed-docs

There still might be some minor rendering issues of the ML page, but people
are investigating.

On Sat, Dec 12, 2015 at 6:58 PM, Burak Yavuz  wrote:

> +1 tested SparkSQL and Streaming on some production sized workloads
>
> On Sat, Dec 12, 2015 at 4:16 PM, Mark Hamstra 
> wrote:
>
>> +1
>>
>> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust > > wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 1.6.0!
>>>
>>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
>>> passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.6.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is *v1.6.0-rc2
>>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>>> *
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>>
>>> The test repository (versioned as v1.6.0-rc2) for this release can be
>>> found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>>
>>> ===
>>> == How can I help test this release? ==
>>> ===
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> 
>>> == What justifies a -1 vote for this release? ==
>>> 
>>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>>> should only occur for significant regressions from 1.5. Bugs already
>>> present in 1.5, minor regressions, or bugs related to new features will not
>>> block this release.
>>>
>>> ===
>>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>>> ===
>>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>>> branch-1.6, since documentations will be published separately from the
>>> release.
>>> 2. New features for non-alpha-modules should target 1.7+.
>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>> target version.
>>>
>>>
>>> ==
>>> == Major changes to help you focus your testing ==
>>> ==
>>>
>>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>>
>>>- SPARK-2629  
>>>trackStateByKey has been renamed to mapWithState
>>>
>>> Spark SQL
>>>
>>>- SPARK-12165 
>>>SPARK-12189  Fix
>>>bugs in eviction of storage memory by execution.
>>>- SPARK-12258  correct
>>>passing null into ScalaUDF
>>>
>>> Notable Features Since 1.5Spark SQL
>>>
>>>- SPARK-11787  Parquet
>>>Performance - Improve Parquet scan performance when using flat
>>>schemas.
>>>- SPARK-10810 
>>>Session Management - Isolated devault database (i.e USE mydb) even
>>>on shared clusters.
>>>- SPARK-   Dataset
>>>API - A type-safe API (similar to RDDs) that performs many
>>>operations on serialized binary data and code generation (i.e. Project
>>>Tungsten).
>>>- SPARK-1  Unified
>>>Memory Management - Shared memory for execution and caching instead
>>>of exclusive division of the regions.
>>>- SPARK-11197  SQL
>>>Queries on Files - Concise syntax for running SQL queries over files
>>>of any supported format without registering a table.
>>>- SPARK-11745  Reading
>>>non-standard JSON files - Added options to read non-standard JSON
>>>files (e.g. single-quotes, unquoted attri

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-14 Thread Sean Owen
With Java 7 / Ubuntu 15, and "-Pyarn -Phadoop-2.6 -Phive
-Phive-thriftserver", I still see the Docker tests fail every time. Is
anyone else seeing them fail (or running them)?

The Hive CliSuite also fails (stack trace at the bottom).

Same deal -- if people are running this test and it's not failing,
this is probably just flakiness of some form.

There's the aforementioned doc generation issue too.

Other than that it compiled and ran all tests for me.

JIRA score: 28 issues, of which 11 bugs, of which 5 critical (listed
below), of which 0 blockers. OK there.



Critical bugs:
SPARK-8447 Test external shuffle service with all shuffle managers
SPARK-10680 Flaky test:
network.RequestTimeoutIntegrationSuite.timeoutInactiveRequests
SPARK-11224 Flaky test: o.a.s.ExternalShuffleServiceSuite
SPARK-11266 Peak memory tests swallow failures
SPARK-11293 Spillable collections leak shuffle memory



- Simple commands *** FAILED ***
  ===
  CliSuite failure output
  ===
  Spark SQL CLI command line: ../../bin/spark-sql --master local
--driver-java-options -Dderby.system.durability=test --conf
spark.ui.enabled=false --hiveconf
javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/home/srowen/spark-1.6.0/sql/hive-thriftserver/target/tmp/spark-240e9e22-8fe8-408b-a116-2a894b3cbf1f;create=true
--hiveconf 
hive.metastore.warehouse.dir=/home/srowen/spark-1.6.0/sql/hive-thriftserver/target/tmp/spark-c336bc67-8e51-4284-b574-e8b79d0d4fce
--hiveconf 
hive.exec.scratchdir=/home/srowen/spark-1.6.0/sql/hive-thriftserver/target/tmp/spark-3a4f9564-d9f1-467f-8016-d4c95389e568
  Exception: java.util.concurrent.TimeoutException: Futures timed out
after [3 minutes]
  Executed query 0 "CREATE TABLE hive_test(key INT, val STRING);",
  But failed to capture expected output "OK" within 3 minutes.

  2015-12-14 13:47:23.07 - stderr> SLF4J: Class path contains multiple
SLF4J bindings.
  2015-12-14 13:47:23.07 - stderr> SLF4J: Found binding in
[jar:file:/home/srowen/spark-1.6.0/assembly/target/scala-2.10/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  2015-12-14 13:47:23.07 - stderr> SLF4J: Found binding in
[jar:file:/home/srowen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  2015-12-14 13:47:23.07 - stderr> SLF4J: See
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  2015-12-14 13:47:23.074 - stderr> SLF4J: Actual binding is of type
[org.slf4j.impl.Log4jLoggerFactory]
  2015-12-14 13:47:39.36 - stdout> SET spark.sql.hive.version=1.2.1
  ===
  End CliSuite failure output
  === (CliSuite.scala:151)




On Sat, Dec 12, 2015 at 5:39 PM, Michael Armbrust
 wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes if
> a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v1.6.0-rc2
> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1169/
>
> The test repository (versioned as v1.6.0-rc2) for this release can be found
> at:
> https://repository.apache.org/content/repositories/orgapachespark-1168/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already present
> in 1.5, minor regressions, or bugs related to new features will not block
> this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately fro

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-14 Thread Ricardo Almeida
+1 (non binding)

Tested our workloads on a standalone cluster:
- Spark Core
- Spark SQL
- Spark MLlib
- Python API



On 12 December 2015 at 18:39, Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc2
> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1169/
>
> The test repository (versioned as v1.6.0-rc2) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1168/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan performance when using flat schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even on
>shared clusters.
>- SPARK-   Dataset
>API - A type-safe API (similar to RDDs) that performs many operations
>on serialized binary data and code generation (i.e. Project Tungsten).
>- SPARK-1  Unified
>Memory Management - Shared memory for execution and caching instead of
>exclusive division of the regions.
>- SPARK-11197  SQL
>Queries on Files - Concise syntax for running SQL queries over files
>of any supported format without registering a table.
>- SPARK-11745  Reading
>non-standard JSON files - Added options to read non-standard JSON
>files (e.g. single-quotes, unquoted attributes)
>- SPARK-10412  
> Per-operator
>Metrics for SQL Execution - Display statistics on a peroperator basis
>for memory usage and spilled data size.
>- SPARK-11329  Star
>(*) expansion for StructTypes - Makes it easier to nest and unest
>arbitrary numbers of columns
>- SPARK-10917 ,
>SPARK-11149 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Joseph Bradley
+1
Ran all tests locally on Mac OS X, and MLlib with large workloads on a
cluster.

On Sat, Dec 12, 2015 at 6:58 PM, Burak Yavuz  wrote:

> +1 tested SparkSQL and Streaming on some production sized workloads
>
> On Sat, Dec 12, 2015 at 4:16 PM, Mark Hamstra 
> wrote:
>
>> +1
>>
>> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust > > wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 1.6.0!
>>>
>>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
>>> passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.6.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is *v1.6.0-rc2
>>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>>> *
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>>
>>> The test repository (versioned as v1.6.0-rc2) for this release can be
>>> found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>>
>>> ===
>>> == How can I help test this release? ==
>>> ===
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> 
>>> == What justifies a -1 vote for this release? ==
>>> 
>>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>>> should only occur for significant regressions from 1.5. Bugs already
>>> present in 1.5, minor regressions, or bugs related to new features will not
>>> block this release.
>>>
>>> ===
>>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>>> ===
>>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>>> branch-1.6, since documentations will be published separately from the
>>> release.
>>> 2. New features for non-alpha-modules should target 1.7+.
>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>> target version.
>>>
>>>
>>> ==
>>> == Major changes to help you focus your testing ==
>>> ==
>>>
>>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>>
>>>- SPARK-2629  
>>>trackStateByKey has been renamed to mapWithState
>>>
>>> Spark SQL
>>>
>>>- SPARK-12165 
>>>SPARK-12189  Fix
>>>bugs in eviction of storage memory by execution.
>>>- SPARK-12258  correct
>>>passing null into ScalaUDF
>>>
>>> Notable Features Since 1.5Spark SQL
>>>
>>>- SPARK-11787  Parquet
>>>Performance - Improve Parquet scan performance when using flat
>>>schemas.
>>>- SPARK-10810 
>>>Session Management - Isolated devault database (i.e USE mydb) even
>>>on shared clusters.
>>>- SPARK-   Dataset
>>>API - A type-safe API (similar to RDDs) that performs many
>>>operations on serialized binary data and code generation (i.e. Project
>>>Tungsten).
>>>- SPARK-1  Unified
>>>Memory Management - Shared memory for execution and caching instead
>>>of exclusive division of the regions.
>>>- SPARK-11197  SQL
>>>Queries on Files - Concise syntax for running SQL queries over files
>>>of any supported format without registering a table.
>>>- SPARK-11745  Reading
>>>non-standard JSON files - Added options to read non-standard JSON
>>>files (e.g. single-quotes, unquoted attributes)
>>>- SPARK-10412  
>>> Per-operator
>>>Metrics for SQL Execution - Display

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Burak Yavuz
+1 tested SparkSQL and Streaming on some production sized workloads

On Sat, Dec 12, 2015 at 4:16 PM, Mark Hamstra 
wrote:

> +1
>
> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.6.0!
>>
>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.6.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v1.6.0-rc2
>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>> *
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>
>> The test repository (versioned as v1.6.0-rc2) for this release can be
>> found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> should only occur for significant regressions from 1.5. Bugs already
>> present in 1.5, minor regressions, or bugs related to new features will not
>> block this release.
>>
>> ===
>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>> ===
>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>> branch-1.6, since documentations will be published separately from the
>> release.
>> 2. New features for non-alpha-modules should target 1.7+.
>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
>> version.
>>
>>
>> ==
>> == Major changes to help you focus your testing ==
>> ==
>>
>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>
>>- SPARK-2629  
>>trackStateByKey has been renamed to mapWithState
>>
>> Spark SQL
>>
>>- SPARK-12165 
>>SPARK-12189  Fix
>>bugs in eviction of storage memory by execution.
>>- SPARK-12258  correct
>>passing null into ScalaUDF
>>
>> Notable Features Since 1.5Spark SQL
>>
>>- SPARK-11787  Parquet
>>Performance - Improve Parquet scan performance when using flat
>>schemas.
>>- SPARK-10810 
>>Session Management - Isolated devault database (i.e USE mydb) even on
>>shared clusters.
>>- SPARK-   Dataset
>>API - A type-safe API (similar to RDDs) that performs many operations
>>on serialized binary data and code generation (i.e. Project Tungsten).
>>- SPARK-1  Unified
>>Memory Management - Shared memory for execution and caching instead
>>of exclusive division of the regions.
>>- SPARK-11197  SQL
>>Queries on Files - Concise syntax for running SQL queries over files
>>of any supported format without registering a table.
>>- SPARK-11745  Reading
>>non-standard JSON files - Added options to read non-standard JSON
>>files (e.g. single-quotes, unquoted attributes)
>>- SPARK-10412  
>> Per-operator
>>Metrics for SQL Execution - Display statistics on a peroperator basis
>>for memory usage and spilled data size.
>>- SPARK-11329  Star
>>(*) expansion for StructTypes - Makes it easier to nest and unest
>>arbitrary numbers o

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Mark Hamstra
+1

On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc2
> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1169/
>
> The test repository (versioned as v1.6.0-rc2) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1168/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan performance when using flat schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even on
>shared clusters.
>- SPARK-   Dataset
>API - A type-safe API (similar to RDDs) that performs many operations
>on serialized binary data and code generation (i.e. Project Tungsten).
>- SPARK-1  Unified
>Memory Management - Shared memory for execution and caching instead of
>exclusive division of the regions.
>- SPARK-11197  SQL
>Queries on Files - Concise syntax for running SQL queries over files
>of any supported format without registering a table.
>- SPARK-11745  Reading
>non-standard JSON files - Added options to read non-standard JSON
>files (e.g. single-quotes, unquoted attributes)
>- SPARK-10412  
> Per-operator
>Metrics for SQL Execution - Display statistics on a peroperator basis
>for memory usage and spilled data size.
>- SPARK-11329  Star
>(*) expansion for StructTypes - Makes it easier to nest and unest
>arbitrary numbers of columns
>- SPARK-10917 ,
>SPARK-11149  In-memory
>Columnar Cache Performance - Significant (up to 14x) speed up when
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
>
> I'm surprised you're suggesting there's not a coupling between a release's
> code and the docs for that release. If a release happens and some time
> later docs come out, that has some effect on people's usage.
>

I'm only suggesting that we shouldn't delay testing of the actual bits, or
wait to iterate on another RC.  Ideally docs should come out with the
actual release announcement (and I'll do everything in my power to make
this happen).  The should also be updated regularly as small issues are
found.

But if it can/will be fixed quickly, what's the hurry? I get it, people
> want a releases sooner than later all else equal, but this is always true.
> It'd be nice to talk about what behaviors have led to being behind schedule
> and this perceived rush to finish now, since this same thing has happened
> in 1.5, 1.4. I'd rather at least collect some opinions on it than
> invalidate the question.
>

I'm happy to debate concrete process suggestions on another thread.


Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Sean Owen
(I can't -1 this.) I do agree that docs have been treated as if separate
from releases in the past. With more maturity in the release process, I'm
questioning that now, as I don't think it's normal. It would be a reason to
release or not release this particular tarball, so a vote thread is the
right place to discuss it.

I'm surprised you're suggesting there's not a coupling between a release's
code and the docs for that release. If a release happens and some time
later docs come out, that has some effect on people's usage. Surely, the
ideal is for docs for x.y to come from the bits for x.y, and thus are
available at the same time.

Reality is something else, and your argument is practical, that the release
is again behind and so shouldn't we overlook this minor problem to get it
out? This particular problem has to get fixed, soon, we agree. It's minor
by virtue of being hopefully temporary.

But if it can/will be fixed quickly, what's the hurry? I get it, people
want a releases sooner than later all else equal, but this is always true.
It'd be nice to talk about what behaviors have led to being behind schedule
and this perceived rush to finish now, since this same thing has happened
in 1.5, 1.4. I'd rather at least collect some opinions on it than
invalidate the question.

On Sat, Dec 12, 2015 at 11:17 PM, Michael Armbrust 
wrote:

> Sean, if you would like to -1 the release you are certainly entitled to,
> but in the past we have never held a release for documentation only
> issues.  If you'd like to change the policy of the project I'm not sure
> that a voting thread is the right place to do it.
>
> I think the right question here, is "How are users going to be affected by
> this temporary issue?".  Given that I'm pretty certain that no users build
> the documentation from the release themselves and instead consume it from
> the published documentation, the docs contained in the release seem less
> important as far as voting on the artifacts is concerned.
>
> In contrast, there have been several threads on the users list asking when
> the release is going to happen.  Should we make them wait longer for
> something that isn't going to affect their usage of the release?  I would
> vote no.  That doesn't mean that we shouldn't fix the documentation issue.
> It just means we shouldn't add unnecessary coupling where it has no benefit.
>
> On Sat, Dec 12, 2015 at 1:50 PM, Sean Owen  wrote:
>
>> I've heard this argument before, but don't quite get it. Documentation is
>> part of a release, and I believe is something we're voting on here too, and
>> therefore needs to 'work' as documentation. We could not release this HTML
>> to the Apache site, so I think that does actually mean the artifacts
>> including docs don't work as a release.
>>
>> Yes, I can see that the non-code artifacts can be released a little bit
>> after the code artifacts with last minute fixes. But, the whole release can
>> just happen later too. Why wouldn't this be a valid reason to block the
>> release?
>>
>> On Sat, Dec 12, 2015 at 6:31 PM, Michael Armbrust > > wrote:
>>
>>> Thanks Ben, but as I said in the first email, docs are published
>>> separately from the release, so this isn't a valid reason to down vote the
>>> RC.  We just provide them to help with testing.
>>>
>>> I'll ask the mllib guys to take a look at that patch though.
>>> On Dec 12, 2015 9:44 AM, "Benjamin Fradet" 
>>> wrote:
>>>
 -1

 For me the docs are not displaying except for the first page, for
 example
 http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/mllib-guide.html
  is
 a blank page.
 This is because of SPARK-12199
 :
 Element[W|w]iseProductExample.scala is not the same in the docs and
 the actual file name.

 On Sat, Dec 12, 2015 at 6:39 PM, Michael Armbrust <
 mich...@databricks.com> wrote:

> I'll kick off the voting with a +1.
>
> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust <
> mich...@databricks.com> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark
>> version 1.6.0!
>>
>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.6.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v1.6.0-rc2
>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>> *
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> T

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
Sean, if you would like to -1 the release you are certainly entitled to,
but in the past we have never held a release for documentation only
issues.  If you'd like to change the policy of the project I'm not sure
that a voting thread is the right place to do it.

I think the right question here, is "How are users going to be affected by
this temporary issue?".  Given that I'm pretty certain that no users build
the documentation from the release themselves and instead consume it from
the published documentation, the docs contained in the release seem less
important as far as voting on the artifacts is concerned.

In contrast, there have been several threads on the users list asking when
the release is going to happen.  Should we make them wait longer for
something that isn't going to affect their usage of the release?  I would
vote no.  That doesn't mean that we shouldn't fix the documentation issue.
It just means we shouldn't add unnecessary coupling where it has no benefit.

On Sat, Dec 12, 2015 at 1:50 PM, Sean Owen  wrote:

> I've heard this argument before, but don't quite get it. Documentation is
> part of a release, and I believe is something we're voting on here too, and
> therefore needs to 'work' as documentation. We could not release this HTML
> to the Apache site, so I think that does actually mean the artifacts
> including docs don't work as a release.
>
> Yes, I can see that the non-code artifacts can be released a little bit
> after the code artifacts with last minute fixes. But, the whole release can
> just happen later too. Why wouldn't this be a valid reason to block the
> release?
>
> On Sat, Dec 12, 2015 at 6:31 PM, Michael Armbrust 
> wrote:
>
>> Thanks Ben, but as I said in the first email, docs are published
>> separately from the release, so this isn't a valid reason to down vote the
>> RC.  We just provide them to help with testing.
>>
>> I'll ask the mllib guys to take a look at that patch though.
>> On Dec 12, 2015 9:44 AM, "Benjamin Fradet" 
>> wrote:
>>
>>> -1
>>>
>>> For me the docs are not displaying except for the first page, for
>>> example
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/mllib-guide.html
>>>  is
>>> a blank page.
>>> This is because of SPARK-12199
>>> :
>>> Element[W|w]iseProductExample.scala is not the same in the docs and the
>>> actual file name.
>>>
>>> On Sat, Dec 12, 2015 at 6:39 PM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 I'll kick off the voting with a +1.

 On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust <
 mich...@databricks.com> wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 1.6.0!
>
> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc2
> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1169/
>
> The test repository (versioned as v1.6.0-rc2) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1168/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1
> votes should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will 
> not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. I

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Sean Owen
I've heard this argument before, but don't quite get it. Documentation is
part of a release, and I believe is something we're voting on here too, and
therefore needs to 'work' as documentation. We could not release this HTML
to the Apache site, so I think that does actually mean the artifacts
including docs don't work as a release.

Yes, I can see that the non-code artifacts can be released a little bit
after the code artifacts with last minute fixes. But, the whole release can
just happen later too. Why wouldn't this be a valid reason to block the
release?

On Sat, Dec 12, 2015 at 6:31 PM, Michael Armbrust 
wrote:

> Thanks Ben, but as I said in the first email, docs are published
> separately from the release, so this isn't a valid reason to down vote the
> RC.  We just provide them to help with testing.
>
> I'll ask the mllib guys to take a look at that patch though.
> On Dec 12, 2015 9:44 AM, "Benjamin Fradet" 
> wrote:
>
>> -1
>>
>> For me the docs are not displaying except for the first page, for example
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/mllib-guide.html
>>  is
>> a blank page.
>> This is because of SPARK-12199
>> :
>> Element[W|w]iseProductExample.scala is not the same in the docs and the
>> actual file name.
>>
>> On Sat, Dec 12, 2015 at 6:39 PM, Michael Armbrust > > wrote:
>>
>>> I'll kick off the voting with a +1.
>>>
>>> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 1.6.0!

 The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
 passes if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.6.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is *v1.6.0-rc2
 (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
 *

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1169/

 The test repository (versioned as v1.6.0-rc2) for this release can be
 found at:
 https://repository.apache.org/content/repositories/orgapachespark-1168/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/

 ===
 == How can I help test this release? ==
 ===
 If you are a Spark user, you can help us test this release by taking an
 existing Spark workload and running on this release candidate, then
 reporting any regressions.

 
 == What justifies a -1 vote for this release? ==
 
 This vote is happening towards the end of the 1.6 QA period, so -1
 votes should only occur for significant regressions from 1.5. Bugs already
 present in 1.5, minor regressions, or bugs related to new features will not
 block this release.

 ===
 == What should happen to JIRA tickets still targeting 1.6.0? ==
 ===
 1. It is OK for documentation patches to target 1.6.0 and still go into
 branch-1.6, since documentations will be published separately from the
 release.
 2. New features for non-alpha-modules should target 1.7+.
 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
 target version.


 ==
 == Major changes to help you focus your testing ==
 ==

 Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming

- SPARK-2629  
trackStateByKey has been renamed to mapWithState

 Spark SQL

- SPARK-12165 
SPARK-12189  Fix
bugs in eviction of storage memory by execution.
- SPARK-12258  
 correct
passing null into ScalaUDF

 Notable Features Since 1.5Spark SQL

- SPARK-11787  

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Yin Huai
+1

Critical and blocker issues of SQL have been addressed.

On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust 
wrote:

> I'll kick off the voting with a +1.
>
> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.6.0!
>>
>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.6.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v1.6.0-rc2
>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>> *
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>
>> The test repository (versioned as v1.6.0-rc2) for this release can be
>> found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> should only occur for significant regressions from 1.5. Bugs already
>> present in 1.5, minor regressions, or bugs related to new features will not
>> block this release.
>>
>> ===
>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>> ===
>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>> branch-1.6, since documentations will be published separately from the
>> release.
>> 2. New features for non-alpha-modules should target 1.7+.
>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
>> version.
>>
>>
>> ==
>> == Major changes to help you focus your testing ==
>> ==
>>
>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>
>>- SPARK-2629  
>>trackStateByKey has been renamed to mapWithState
>>
>> Spark SQL
>>
>>- SPARK-12165 
>>SPARK-12189  Fix
>>bugs in eviction of storage memory by execution.
>>- SPARK-12258  correct
>>passing null into ScalaUDF
>>
>> Notable Features Since 1.5Spark SQL
>>
>>- SPARK-11787  Parquet
>>Performance - Improve Parquet scan performance when using flat
>>schemas.
>>- SPARK-10810 
>>Session Management - Isolated devault database (i.e USE mydb) even on
>>shared clusters.
>>- SPARK-   Dataset
>>API - A type-safe API (similar to RDDs) that performs many operations
>>on serialized binary data and code generation (i.e. Project Tungsten).
>>- SPARK-1  Unified
>>Memory Management - Shared memory for execution and caching instead
>>of exclusive division of the regions.
>>- SPARK-11197  SQL
>>Queries on Files - Concise syntax for running SQL queries over files
>>of any supported format without registering a table.
>>- SPARK-11745  Reading
>>non-standard JSON files - Added options to read non-standard JSON
>>files (e.g. single-quotes, unquoted attributes)
>>- SPARK-10412  
>> Per-operator
>>Metrics for SQL Execution - Display statistics on a peroperator basis
>>for memory usage and spilled data size.
>>- SPARK-11329  Star
>>(*) expansion for StructTypes - Makes it easier to nest and un

RE: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Jean-Baptiste Onofré


+1 (non binding)
Tested with different samples.
RegardsJB 


Sent from my Samsung device

 Original message 
From: Michael Armbrust  
Date: 12/12/2015  18:39  (GMT+01:00) 
To: dev@spark.apache.org 
Subject: [VOTE] Release Apache Spark 1.6.0 (RC2) 

Please vote on releasing the following candidate as Apache Spark version 1.6.0!
The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.6.0[ ] -1 Do not release this 
package because ...
To learn more about Apache Spark, please see http://spark.apache.org/
The tag to be voted on is v1.6.0-rc2 (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
The release files, including signatures, digests, etc. can be found 
at:http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
Release artifacts are signed with the following 
key:https://people.apache.org/keys/committer/pwendell.asc
The staging repository for this release can be found 
at:https://repository.apache.org/content/repositories/orgapachespark-1169/
The test repository (versioned as v1.6.0-rc2) for this release can be found 
at:https://repository.apache.org/content/repositories/orgapachespark-1168/
The documentation corresponding to this release can be found 
at:http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
= How can I help test this release? 
=If you are a Spark user, you can help 
us test this release by taking an existing Spark workload and running on this 
release candidate, then reporting any regressions.
== What justifies a -1 vote for 
this release? ==This vote is 
happening towards the end of the 1.6 QA period, so -1 votes should only occur 
for significant regressions from 1.5. Bugs already present in 1.5, minor 
regressions, or bugs related to new features will not block this release.
= What should 
happen to JIRA tickets still targeting 1.6.0? 
=1. It is OK 
for documentation patches to target 1.6.0 and still go into branch-1.6, since 
documentations will be published separately from the release.2. New features 
for non-alpha-modules should target 1.7+.3. Non-blocker bug fixes should target 
1.6.1 or 1.7.0, or drop the target version.

 Major changes to help you 
focus your testing 
Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark StreamingSPARK-2629  
trackStateByKey has been renamed to mapWithStateSpark SQLSPARK-12165 
SPARK-12189 Fix bugs in eviction of storage memory by execution.SPARK-12258 
correct passing null into ScalaUDFNotable Features Since 1.5Spark 
SQLSPARK-11787 Parquet Performance - Improve Parquet scan performance when 
using flat schemas.SPARK-10810 Session Management - Isolated devault database 
(i.e USE mydb) even on shared clusters.SPARK-  Dataset API - A type-safe 
API (similar to RDDs) that performs many operations on serialized binary data 
and code generation (i.e. Project Tungsten).SPARK-1 Unified Memory 
Management - Shared memory for execution and caching instead of exclusive 
division of the regions.SPARK-11197 SQL Queries on Files - Concise syntax for 
running SQL queries over files of any supported format without registering a 
table.SPARK-11745 Reading non-standard JSON files - Added options to read 
non-standard JSON files (e.g. single-quotes, unquoted attributes)SPARK-10412 
Per-operator Metrics for SQL Execution - Display statistics on a peroperator 
basis for memory usage and spilled data size.SPARK-11329 Star (*) expansion for 
StructTypes - Makes it easier to nest and unest arbitrary numbers of 
columnsSPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - 
Significant (up to 14x) speed up when caching data that contains complex types 
in DataFrames or SQL.SPARK-1 Fast null-safe joins - Joins using null-safe 
equality (<=>) will now execute using SortMergeJoin instead of computing a 
cartisian product.SPARK-11389 SQL Execution Using Off-Heap Memory - Support for 
configuring query execution to occur using off-heap memory to avoid GC 
overheadSPARK-10978 Datasource API Avoid Double Filter - When implemeting a 
datasource with filter pushdown, developers can now tell Spark SQL to avoid 
double evaluating a pushed-down filter.SPARK-4849  Advanced Layout of Cached 
Data - storing partitioning and ordering schemes in In-memory table scan, and 
adding distributeBy and localSort to DF APISPARK-9858  Adaptive query execution 
- Intial support for automatically selecting the number of reducers for joins 
and aggregations.SPARK-9241  Improved query planner for queries having disti

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
Thanks Ben, but as I said in the first email, docs are published separately
from the release, so this isn't a valid reason to down vote the RC.  We
just provide them to help with testing.

I'll ask the mllib guys to take a look at that patch though.
On Dec 12, 2015 9:44 AM, "Benjamin Fradet" 
wrote:

> -1
>
> For me the docs are not displaying except for the first page, for example
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/mllib-guide.html
>  is
> a blank page.
> This is because of SPARK-12199
> :
> Element[W|w]iseProductExample.scala is not the same in the docs and the
> actual file name.
>
> On Sat, Dec 12, 2015 at 6:39 PM, Michael Armbrust 
> wrote:
>
>> I'll kick off the voting with a +1.
>>
>> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust > > wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 1.6.0!
>>>
>>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
>>> passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.6.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is *v1.6.0-rc2
>>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>>> *
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>>
>>> The test repository (versioned as v1.6.0-rc2) for this release can be
>>> found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>>
>>> ===
>>> == How can I help test this release? ==
>>> ===
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> 
>>> == What justifies a -1 vote for this release? ==
>>> 
>>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>>> should only occur for significant regressions from 1.5. Bugs already
>>> present in 1.5, minor regressions, or bugs related to new features will not
>>> block this release.
>>>
>>> ===
>>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>>> ===
>>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>>> branch-1.6, since documentations will be published separately from the
>>> release.
>>> 2. New features for non-alpha-modules should target 1.7+.
>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>> target version.
>>>
>>>
>>> ==
>>> == Major changes to help you focus your testing ==
>>> ==
>>>
>>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>>
>>>- SPARK-2629  
>>>trackStateByKey has been renamed to mapWithState
>>>
>>> Spark SQL
>>>
>>>- SPARK-12165 
>>>SPARK-12189  Fix
>>>bugs in eviction of storage memory by execution.
>>>- SPARK-12258  correct
>>>passing null into ScalaUDF
>>>
>>> Notable Features Since 1.5Spark SQL
>>>
>>>- SPARK-11787  Parquet
>>>Performance - Improve Parquet scan performance when using flat
>>>schemas.
>>>- SPARK-10810 
>>>Session Management - Isolated devault database (i.e USE mydb) even
>>>on shared clusters.
>>>- SPARK-   Dataset
>>>API - A type-safe API (similar to RDDs) that performs many
>>>operations on serialized binary data and code generation (i.e. Project
>>>Tungsten).
>>>- SPARK-1  Unified
>>>Memory Management - Shared memory for execution and caching instead
>>>of exclusive division of the regions.
>>>- SPARK-11197 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Benjamin Fradet
-1

For me the docs are not displaying except for the first page, for example
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/mllib-guide.html
is
a blank page.
This is because of SPARK-12199 
: Element[W|w]iseProductExample.scala is not the same in the docs and the
actual file name.

On Sat, Dec 12, 2015 at 6:39 PM, Michael Armbrust 
wrote:

> I'll kick off the voting with a +1.
>
> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.6.0!
>>
>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.6.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v1.6.0-rc2
>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>> *
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>
>> The test repository (versioned as v1.6.0-rc2) for this release can be
>> found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> should only occur for significant regressions from 1.5. Bugs already
>> present in 1.5, minor regressions, or bugs related to new features will not
>> block this release.
>>
>> ===
>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>> ===
>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>> branch-1.6, since documentations will be published separately from the
>> release.
>> 2. New features for non-alpha-modules should target 1.7+.
>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
>> version.
>>
>>
>> ==
>> == Major changes to help you focus your testing ==
>> ==
>>
>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>
>>- SPARK-2629  
>>trackStateByKey has been renamed to mapWithState
>>
>> Spark SQL
>>
>>- SPARK-12165 
>>SPARK-12189  Fix
>>bugs in eviction of storage memory by execution.
>>- SPARK-12258  correct
>>passing null into ScalaUDF
>>
>> Notable Features Since 1.5Spark SQL
>>
>>- SPARK-11787  Parquet
>>Performance - Improve Parquet scan performance when using flat
>>schemas.
>>- SPARK-10810 
>>Session Management - Isolated devault database (i.e USE mydb) even on
>>shared clusters.
>>- SPARK-   Dataset
>>API - A type-safe API (similar to RDDs) that performs many operations
>>on serialized binary data and code generation (i.e. Project Tungsten).
>>- SPARK-1  Unified
>>Memory Management - Shared memory for execution and caching instead
>>of exclusive division of the regions.
>>- SPARK-11197  SQL
>>Queries on Files - Concise syntax for running SQL queries over files
>>of any supported format without registering a table.
>>- SPARK-11745  Reading
>>non-standard JSON files - Added options to read non-standard JSON
>>files (e.g. single-quotes, unquoted attributes)
>>- SPARK-10412 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-12 Thread Michael Armbrust
I'll kick off the voting with a +1.

On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc2
> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1169/
>
> The test repository (versioned as v1.6.0-rc2) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1168/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan performance when using flat schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even on
>shared clusters.
>- SPARK-   Dataset
>API - A type-safe API (similar to RDDs) that performs many operations
>on serialized binary data and code generation (i.e. Project Tungsten).
>- SPARK-1  Unified
>Memory Management - Shared memory for execution and caching instead of
>exclusive division of the regions.
>- SPARK-11197  SQL
>Queries on Files - Concise syntax for running SQL queries over files
>of any supported format without registering a table.
>- SPARK-11745  Reading
>non-standard JSON files - Added options to read non-standard JSON
>files (e.g. single-quotes, unquoted attributes)
>- SPARK-10412  
> Per-operator
>Metrics for SQL Execution - Display statistics on a peroperator basis
>for memory usage and spilled data size.
>- SPARK-11329  Star
>(*) expansion for StructTypes - Makes it easier to nest and unest
>arbitrary numbers of columns
>- SPARK-10917 ,
>SPARK-11149  In-memory
>Columnar Cache Performance - Significan