Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-21 Thread Michael Armbrust
It's come to my attention that there have been several bug fixes merged
since RC3:

  - SPARK-12404 - Fix serialization error for Datasets with
Timestamps/Arrays/Decimal
  - SPARK-12218 - Fix incorrect pushdown of filters to parquet
  - SPARK-12395 - Fix join columns of outer join for DataFrame using
  - SPARK-12413 - Fix mesos HA

Normally, these would probably not be sufficient to hold the release,
however with the holidays going on in the US this week, we don't have the
resources to finalize 1.6 until next Monday.  Given this delay anyway, I
propose that we cut one final RC with the above fixes and plan for the
actual release first thing next week.

I'll post RC4 shortly and cancel this vote if there are no objections.
Since this vote nearly passed with no major issues, I don't anticipate any
problems with RC4.

Michael

On Sat, Dec 19, 2015 at 11:44 PM, Jeff Zhang  wrote:

> +1 (non-binding)
>
> All the test passed, and run it on HDP 2.3.2 sandbox successfully.
>
> On Sun, Dec 20, 2015 at 10:43 AM, Luciano Resende 
> wrote:
>
>> +1 (non-binding)
>>
>> Tested Standalone mode, SparkR and couple Stream Apps, all seem ok.
>>
>> On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust > > wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 1.6.0!
>>>
>>> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
>>> passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.6.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is *v1.6.0-rc3
>>> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
>>> *
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1174/
>>>
>>> The test repository (versioned as v1.6.0-rc3) for this release can be
>>> found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1173/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>>>
>>> ===
>>> == How can I help test this release? ==
>>> ===
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> 
>>> == What justifies a -1 vote for this release? ==
>>> 
>>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>>> should only occur for significant regressions from 1.5. Bugs already
>>> present in 1.5, minor regressions, or bugs related to new features will not
>>> block this release.
>>>
>>> ===
>>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>>> ===
>>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>>> branch-1.6, since documentations will be published separately from the
>>> release.
>>> 2. New features for non-alpha-modules should target 1.7+.
>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>> target version.
>>>
>>>
>>> ==
>>> == Major changes to help you focus your testing ==
>>> ==
>>>
>>> Notable changes since 1.6 RC2
>>> - SPARK_VERSION has been set correctly
>>> - SPARK-12199 ML Docs are publishing correctly
>>> - SPARK-12345 Mesos cluster mode has been fixed
>>>
>>> Notable changes since 1.6 RC1
>>> Spark Streaming
>>>
>>>- SPARK-2629  
>>>trackStateByKey has been renamed to mapWithState
>>>
>>> Spark SQL
>>>
>>>- SPARK-12165 
>>>SPARK-12189  Fix
>>>bugs in eviction of storage memory by execution.
>>>- SPARK-12258  correct
>>>passing null into ScalaUDF
>>>
>>> Notable Features Since 1.5Spark SQL
>>>
>>>- SPARK-11787  Parquet
>>>Performance - Improve Parquet scan performance when using flat
>>>schemas.
>>>- SPARK-10810 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-19 Thread Luciano Resende
+1 (non-binding)

Tested Standalone mode, SparkR and couple Stream Apps, all seem ok.

On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc3
> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1174/
>
> The test repository (versioned as v1.6.0-rc3) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1173/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Notable changes since 1.6 RC2
> - SPARK_VERSION has been set correctly
> - SPARK-12199 ML Docs are publishing correctly
> - SPARK-12345 Mesos cluster mode has been fixed
>
> Notable changes since 1.6 RC1
> Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan performance when using flat schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even on
>shared clusters.
>- SPARK-   Dataset
>API - A type-safe API (similar to RDDs) that performs many operations
>on serialized binary data and code generation (i.e. Project Tungsten).
>- SPARK-1  Unified
>Memory Management - Shared memory for execution and caching instead of
>exclusive division of the regions.
>- SPARK-11197  SQL
>Queries on Files - Concise syntax for running SQL queries over files
>of any supported format without registering a table.
>- SPARK-11745  Reading
>non-standard JSON files - Added options to read non-standard JSON
>files (e.g. single-quotes, unquoted attributes)
>- SPARK-10412  
> Per-operator
>Metrics for SQL Execution - Display statistics on a peroperator basis
>for memory usage and spilled data size.
>- SPARK-11329  Star
>(*) expansion for StructTypes - Makes it easier to nest and 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-19 Thread Zsolt Tóth
+1 (non-binding)

Testing environment:
-CDH5.5 single node docker
-Prebuilt spark-1.6.0-hadoop2.6.tgz
-Yarn-cluster mode

Comparing outputs of Spark 1.5.x and 1.6.0-RC3:

Pyspark
OK?: K-Means (ml) - Note: our tests show a numerical diff here compared to
the 1.5.2 output. Since K-Means has a random factor, this can be expected
behaviour - is it because of SPARK-10779? If so, I think it should be
listed in the Mllib/ml docs.
OK: Logistic Regression (ml), Linear Regression (mllib)
OK: Nested Spark SQL query

SparkR
OK: Logistic Regression
OK: Nested Spark SQL query

Machine learning - Java:
OK: Decision Tree (mllib and ml): Gini, Entropy
OK. Random Forest (ml): Gini, Entropy
OK: Linear, Lasso, Ridge Regression (mllib)
OK: Logistc Regression (mllib): SGD, L-BFGS
OK: SVM (mllib)

I/O:
OK: Reading/Writing Parquet to/from DataFrame
OK: Reading/Writing Textfile to/from RDD

2015-12-19 2:09 GMT+01:00 Marcelo Vanzin :

> +1 (non-binding)
>
> Tests the without-hadoop binaries (so didn't run Hive-related tests)
> with a test batch including standalone / client, yarn / client and
> cluster, including core, mllib and streaming (flume and kafka).
>
> On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust
>  wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 1.6.0!
> >
> > The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
> passes
> > if a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 1.6.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v1.6.0-rc3
> > (168c89e07c51fa24b0bb88582c739cec0acb44d7)
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1174/
> >
> > The test repository (versioned as v1.6.0-rc3) for this release can be
> found
> > at:
> > https://repository.apache.org/content/repositories/orgapachespark-1173/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
> >
> > ===
> > == How can I help test this release? ==
> > ===
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > 
> > == What justifies a -1 vote for this release? ==
> > 
> > This vote is happening towards the end of the 1.6 QA period, so -1 votes
> > should only occur for significant regressions from 1.5. Bugs already
> present
> > in 1.5, minor regressions, or bugs related to new features will not block
> > this release.
> >
> > ===
> > == What should happen to JIRA tickets still targeting 1.6.0? ==
> > ===
> > 1. It is OK for documentation patches to target 1.6.0 and still go into
> > branch-1.6, since documentations will be published separately from the
> > release.
> > 2. New features for non-alpha-modules should target 1.7+.
> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> > version.
> >
> >
> > ==
> > == Major changes to help you focus your testing ==
> > ==
> >
> > Notable changes since 1.6 RC2
> >
> >
> > - SPARK_VERSION has been set correctly
> > - SPARK-12199 ML Docs are publishing correctly
> > - SPARK-12345 Mesos cluster mode has been fixed
> >
> > Notable changes since 1.6 RC1
> >
> > Spark Streaming
> >
> > SPARK-2629  trackStateByKey has been renamed to mapWithState
> >
> > Spark SQL
> >
> > SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by
> execution.
> > SPARK-12258 correct passing null into ScalaUDF
> >
> > Notable Features Since 1.5
> >
> > Spark SQL
> >
> > SPARK-11787 Parquet Performance - Improve Parquet scan performance when
> > using flat schemas.
> > SPARK-10810 Session Management - Isolated devault database (i.e USE mydb)
> > even on shared clusters.
> > SPARK-  Dataset API - A type-safe API (similar to RDDs) that performs
> > many operations on serialized binary data and code generation (i.e.
> Project
> > Tungsten).
> > SPARK-1 Unified Memory Management - Shared memory for execution and
> > caching instead of exclusive division of the regions.
> > 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-19 Thread Jeff Zhang
+1 (non-binding)

All the test passed, and run it on HDP 2.3.2 sandbox successfully.

On Sun, Dec 20, 2015 at 10:43 AM, Luciano Resende 
wrote:

> +1 (non-binding)
>
> Tested Standalone mode, SparkR and couple Stream Apps, all seem ok.
>
> On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.6.0!
>>
>> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.6.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v1.6.0-rc3
>> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
>> *
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1174/
>>
>> The test repository (versioned as v1.6.0-rc3) for this release can be
>> found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1173/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>>
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> should only occur for significant regressions from 1.5. Bugs already
>> present in 1.5, minor regressions, or bugs related to new features will not
>> block this release.
>>
>> ===
>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>> ===
>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>> branch-1.6, since documentations will be published separately from the
>> release.
>> 2. New features for non-alpha-modules should target 1.7+.
>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
>> version.
>>
>>
>> ==
>> == Major changes to help you focus your testing ==
>> ==
>>
>> Notable changes since 1.6 RC2
>> - SPARK_VERSION has been set correctly
>> - SPARK-12199 ML Docs are publishing correctly
>> - SPARK-12345 Mesos cluster mode has been fixed
>>
>> Notable changes since 1.6 RC1
>> Spark Streaming
>>
>>- SPARK-2629  
>>trackStateByKey has been renamed to mapWithState
>>
>> Spark SQL
>>
>>- SPARK-12165 
>>SPARK-12189  Fix
>>bugs in eviction of storage memory by execution.
>>- SPARK-12258  correct
>>passing null into ScalaUDF
>>
>> Notable Features Since 1.5Spark SQL
>>
>>- SPARK-11787  Parquet
>>Performance - Improve Parquet scan performance when using flat
>>schemas.
>>- SPARK-10810 
>>Session Management - Isolated devault database (i.e USE mydb) even on
>>shared clusters.
>>- SPARK-   Dataset
>>API - A type-safe API (similar to RDDs) that performs many operations
>>on serialized binary data and code generation (i.e. Project Tungsten).
>>- SPARK-1  Unified
>>Memory Management - Shared memory for execution and caching instead
>>of exclusive division of the regions.
>>- SPARK-11197  SQL
>>Queries on Files - Concise syntax for running SQL queries over files
>>of any supported format without registering a table.
>>- SPARK-11745  Reading
>>non-standard JSON files - Added options to read non-standard JSON
>>files (e.g. single-quotes, unquoted attributes)
>>- SPARK-10412 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-18 Thread Daniel Darabos
+1 (non-binding)

It passes our tests after we registered 6 new classes with Kryo:


kryo.register(classOf[org.apache.spark.sql.catalyst.expressions.UnsafeRow])
kryo.register(classOf[Array[org.apache.spark.mllib.tree.model.Split]])

kryo.register(Class.forName("org.apache.spark.mllib.tree.model.Bin"))
kryo.register(Class.forName("[Lorg.apache.spark.mllib.tree.model.Bin;"))

kryo.register(Class.forName("org.apache.spark.mllib.tree.model.DummyLowSplit"))

kryo.register(Class.forName("org.apache.spark.mllib.tree.model.DummyHighSplit"))

It also spams "Managed memory leak detected; size = 15735058 bytes, TID =
847" for almost every task. I haven't yet figured out why.


On Fri, Dec 18, 2015 at 6:45 AM, Krishna Sankar  wrote:

> +1 (non-binding, of course)
>
> 1. Compiled OSX 10.10 (Yosemite) OK Total time: 29:32 min
>  mvn clean package -Pyarn -Phadoop-2.6 -DskipTests
> 2. Tested pyspark, mllib (iPython 4.0)
> 2.0 Spark version is 1.6.0
> 2.1. statistics (min,max,mean,Pearson,Spearman) OK
> 2.2. Linear/Ridge/Laso Regression OK
> 2.3. Decision Tree, Naive Bayes OK
> 2.4. KMeans OK
>Center And Scale OK
> 2.5. RDD operations OK
>   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
>Model evaluation/optimization (rank, numIter, lambda) with
> itertools OK
> 3. Scala - MLlib
> 3.1. statistics (min,max,mean,Pearson,Spearman) OK
> 3.2. LinearRegressionWithSGD OK
> 3.3. Decision Tree OK
> 3.4. KMeans OK
> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
> 3.6. saveAsParquetFile OK
> 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile,
> registerTempTable, sql OK
> 3.8. result = sqlContext.sql("SELECT
> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
> 4.0. Spark SQL from Python OK
> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'") OK
> 5.0. Packages
> 5.1. com.databricks.spark.csv - read/write OK (--packages
> com.databricks:spark-csv_2.10:1.3.0)
> 6.0. DataFrames
> 6.1. cast,dtypes OK
> 6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK
> 6.3. All joins,sql,set operations,udf OK
>
> Cheers & Good work guys
> 
>
> On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.6.0!
>>
>> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.6.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v1.6.0-rc3
>> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
>> *
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1174/
>>
>> The test repository (versioned as v1.6.0-rc3) for this release can be
>> found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1173/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>>
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> should only occur for significant regressions from 1.5. Bugs already
>> present in 1.5, minor regressions, or bugs related to new features will not
>> block this release.
>>
>> ===
>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>> ===
>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>> branch-1.6, since documentations will be published separately from the
>> release.
>> 2. New features for non-alpha-modules should target 1.7+.
>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
>> version.
>>
>>
>> ==
>> == Major changes to help you focus 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-18 Thread Sean Owen
For me, mostly the same as before: tests are mostly passing, but I can
never get the docker tests to pass. If anyone knows a special profile
or package that needs to be enabled, I can try that and/or
fix/document it. Just wondering if it's me.

I'm on Java 7 + Ubuntu 15.10, with -Pyarn -Phive -Phive-thriftserver
-Phadoop-2.6

On Wed, Dec 16, 2015 at 9:32 PM, Michael Armbrust
 wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v1.6.0-rc3
> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1174/
>
> The test repository (versioned as v1.6.0-rc3) for this release can be found
> at:
> https://repository.apache.org/content/repositories/orgapachespark-1173/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already present
> in 1.5, minor regressions, or bugs related to new features will not block
> this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Notable changes since 1.6 RC2
>
>
> - SPARK_VERSION has been set correctly
> - SPARK-12199 ML Docs are publishing correctly
> - SPARK-12345 Mesos cluster mode has been fixed
>
> Notable changes since 1.6 RC1
>
> Spark Streaming
>
> SPARK-2629  trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
> SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by execution.
> SPARK-12258 correct passing null into ScalaUDF
>
> Notable Features Since 1.5
>
> Spark SQL
>
> SPARK-11787 Parquet Performance - Improve Parquet scan performance when
> using flat schemas.
> SPARK-10810 Session Management - Isolated devault database (i.e USE mydb)
> even on shared clusters.
> SPARK-  Dataset API - A type-safe API (similar to RDDs) that performs
> many operations on serialized binary data and code generation (i.e. Project
> Tungsten).
> SPARK-1 Unified Memory Management - Shared memory for execution and
> caching instead of exclusive division of the regions.
> SPARK-11197 SQL Queries on Files - Concise syntax for running SQL queries
> over files of any supported format without registering a table.
> SPARK-11745 Reading non-standard JSON files - Added options to read
> non-standard JSON files (e.g. single-quotes, unquoted attributes)
> SPARK-10412 Per-operator Metrics for SQL Execution - Display statistics on a
> peroperator basis for memory usage and spilled data size.
> SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to nest and
> unest arbitrary numbers of columns
> SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - Significant
> (up to 14x) speed up when caching data that contains complex types in
> DataFrames or SQL.
> SPARK-1 Fast null-safe joins - Joins using null-safe equality (<=>) will
> now execute using SortMergeJoin instead of computing a cartisian product.
> SPARK-11389 SQL Execution Using Off-Heap Memory - Support for configuring
> query execution to occur using off-heap memory to avoid GC overhead
> SPARK-10978 Datasource API Avoid Double Filter 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-18 Thread Tom Graves
+1.  Ran some regression tests on Spark on Yarn (hadoop 2.6 and 2.7).
Tom 


On Wednesday, December 16, 2015 3:32 PM, Michael Armbrust 
 wrote:
 

 Please vote on releasing the following candidate as Apache Spark version 1.6.0!
The vote is open until Saturday, December 19, 2015 at 18:00 UTC and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.6.0[ ] -1 Do not release this 
package because ...
To learn more about Apache Spark, please see http://spark.apache.org/
The tag to be voted on is v1.6.0-rc3 (168c89e07c51fa24b0bb88582c739cec0acb44d7)
The release files, including signatures, digests, etc. can be found 
at:http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
Release artifacts are signed with the following 
key:https://people.apache.org/keys/committer/pwendell.asc
The staging repository for this release can be found 
at:https://repository.apache.org/content/repositories/orgapachespark-1174/
The test repository (versioned as v1.6.0-rc3) for this release can be found 
at:https://repository.apache.org/content/repositories/orgapachespark-1173/
The documentation corresponding to this release can be found 
at:http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
= How can I help test this release? 
=If you are a Spark user, you can help 
us test this release by taking an existing Spark workload and running on this 
release candidate, then reporting any regressions.
== What justifies a -1 vote for 
this release? ==This vote is 
happening towards the end of the 1.6 QA period, so -1 votes should only occur 
for significant regressions from 1.5. Bugs already present in 1.5, minor 
regressions, or bugs related to new features will not block this release.
= What should 
happen to JIRA tickets still targeting 1.6.0? 
=1. It is OK 
for documentation patches to target 1.6.0 and still go into branch-1.6, since 
documentations will be published separately from the release.2. New features 
for non-alpha-modules should target 1.7+.3. Non-blocker bug fixes should target 
1.6.1 or 1.7.0, or drop the target version.

 Major changes to help you 
focus your testing 

Notable changes since 1.6 RC2

- SPARK_VERSION has been set correctly
- SPARK-12199 ML Docs are publishing correctly
- SPARK-12345 Mesos cluster mode has been fixed

Notable changes since 1.6 RC1


Spark Streaming
   
   - SPARK-2629  trackStateByKey has been renamed to mapWithState

Spark SQL
   
   - SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by 
execution.
   - SPARK-12258 correct passing null into ScalaUDF

Notable Features Since 1.5

Spark SQL
   
   - SPARK-11787 Parquet Performance - Improve Parquet scan performance when 
using flat schemas.
   - SPARK-10810 Session Management - Isolated devault database (i.e USE mydb) 
even on shared clusters.
   - SPARK-  Dataset API - A type-safe API (similar to RDDs) that performs 
many operations on serialized binary data and code generation (i.e. Project 
Tungsten).
   - SPARK-1 Unified Memory Management - Shared memory for execution and 
caching instead of exclusive division of the regions.
   - SPARK-11197 SQL Queries on Files - Concise syntax for running SQL queries 
over files of any supported format without registering a table.
   - SPARK-11745 Reading non-standard JSON files - Added options to read 
non-standard JSON files (e.g. single-quotes, unquoted attributes)
   - SPARK-10412 Per-operator Metrics for SQL Execution - Display statistics on 
a peroperator basis for memory usage and spilled data size.
   - SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to nest 
and unest arbitrary numbers of columns
   - SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - 
Significant (up to 14x) speed up when caching data that contains complex types 
in DataFrames or SQL.
   - SPARK-1 Fast null-safe joins - Joins using null-safe equality (<=>) 
will now execute using SortMergeJoin instead of computing a cartisian product.
   - SPARK-11389 SQL Execution Using Off-Heap Memory - Support for configuring 
query execution to occur using off-heap memory to avoid GC overhead
   - SPARK-10978 Datasource API Avoid Double Filter - When implemeting a 
datasource with filter pushdown, developers can now tell Spark SQL to avoid 
double evaluating a pushed-down filter.
   - SPARK-4849  Advanced Layout of Cached Data - storing partitioning and 
ordering schemes in In-memory table scan, and adding distributeBy and localSort 
to DF API
   - SPARK-9858  Adaptive 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-18 Thread Sean Owen
Yes that's what I mean. If they're not quite working, let's disable
them, but first, we have to rule out that I'm not just missing some
requirement.

Functionally, it's not worth blocking the release. It seems like bad
form to release with tests that always fail for a non-trivial number
of users, but we have to establish that. If it's something with an
easy fix (or needs disabling) and another RC needs to be baked, might
be worth including.

Logs coming offline

On Fri, Dec 18, 2015 at 5:30 PM, Mark Grover  wrote:
> Sean,
> Are you referring to docker integration tests? If so, they were disabled for
> majority of the release and I recently worked on it (SPARK-11796) and once
> it got committed, the tests were re-enabled in Spark builds. I am not sure
> what OSs the test builds use, but it should be passing there too.
>
> During my work, I tested on Ubuntu Precise and they worked. If you could
> share the logs with me offline, I could take a look. Alternatively, I can
> try to see if I can get Ubuntu 15 instance. However, given the history of
> these tests, I personally don't think it makes sense to block the release
> based on them not running on Ubuntu 15.
>
> On Fri, Dec 18, 2015 at 9:22 AM, Sean Owen  wrote:
>>
>> For me, mostly the same as before: tests are mostly passing, but I can
>> never get the docker tests to pass. If anyone knows a special profile
>> or package that needs to be enabled, I can try that and/or
>> fix/document it. Just wondering if it's me.
>>
>> I'm on Java 7 + Ubuntu 15.10, with -Pyarn -Phive -Phive-thriftserver
>> -Phadoop-2.6
>>
>> On Wed, Dec 16, 2015 at 9:32 PM, Michael Armbrust
>>  wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 1.6.0!
>> >
>> > The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
>> > passes
>> > if a majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 1.6.0
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v1.6.0-rc3
>> > (168c89e07c51fa24b0bb88582c739cec0acb44d7)
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>> >
>> > Release artifacts are signed with the following key:
>> > https://people.apache.org/keys/committer/pwendell.asc
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1174/
>> >
>> > The test repository (versioned as v1.6.0-rc3) for this release can be
>> > found
>> > at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1173/
>> >
>> > The documentation corresponding to this release can be found at:
>> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>> >
>> > ===
>> > == How can I help test this release? ==
>> > ===
>> > If you are a Spark user, you can help us test this release by taking an
>> > existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > 
>> > == What justifies a -1 vote for this release? ==
>> > 
>> > This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> > should only occur for significant regressions from 1.5. Bugs already
>> > present
>> > in 1.5, minor regressions, or bugs related to new features will not
>> > block
>> > this release.
>> >
>> > ===
>> > == What should happen to JIRA tickets still targeting 1.6.0? ==
>> > ===
>> > 1. It is OK for documentation patches to target 1.6.0 and still go into
>> > branch-1.6, since documentations will be published separately from the
>> > release.
>> > 2. New features for non-alpha-modules should target 1.7+.
>> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>> > target
>> > version.
>> >
>> >
>> > ==
>> > == Major changes to help you focus your testing ==
>> > ==
>> >
>> > Notable changes since 1.6 RC2
>> >
>> >
>> > - SPARK_VERSION has been set correctly
>> > - SPARK-12199 ML Docs are publishing correctly
>> > - SPARK-12345 Mesos cluster mode has been fixed
>> >
>> > Notable changes since 1.6 RC1
>> >
>> > Spark Streaming
>> >
>> > SPARK-2629  trackStateByKey has been renamed to mapWithState
>> >
>> > Spark SQL
>> >
>> > SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by
>> > execution.
>> > SPARK-12258 correct passing null into ScalaUDF
>> >
>> > Notable Features Since 1.5

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-18 Thread Mark Grover
Thanks Sean for sending me the logs offline.

Turns out the tests are failing again, for reasons unrelated to Spark. I
have filed https://issues.apache.org/jira/browse/SPARK-12426 for that with
some details. In the meanwhile, I agree with Sean, these tests should be
disabled. And, again, I don't think this failures warrants blocking the
release.

Mark

On Fri, Dec 18, 2015 at 9:32 AM, Sean Owen  wrote:

> Yes that's what I mean. If they're not quite working, let's disable
> them, but first, we have to rule out that I'm not just missing some
> requirement.
>
> Functionally, it's not worth blocking the release. It seems like bad
> form to release with tests that always fail for a non-trivial number
> of users, but we have to establish that. If it's something with an
> easy fix (or needs disabling) and another RC needs to be baked, might
> be worth including.
>
> Logs coming offline
>
> On Fri, Dec 18, 2015 at 5:30 PM, Mark Grover  wrote:
> > Sean,
> > Are you referring to docker integration tests? If so, they were disabled
> for
> > majority of the release and I recently worked on it (SPARK-11796) and
> once
> > it got committed, the tests were re-enabled in Spark builds. I am not
> sure
> > what OSs the test builds use, but it should be passing there too.
> >
> > During my work, I tested on Ubuntu Precise and they worked. If you could
> > share the logs with me offline, I could take a look. Alternatively, I can
> > try to see if I can get Ubuntu 15 instance. However, given the history of
> > these tests, I personally don't think it makes sense to block the release
> > based on them not running on Ubuntu 15.
> >
> > On Fri, Dec 18, 2015 at 9:22 AM, Sean Owen  wrote:
> >>
> >> For me, mostly the same as before: tests are mostly passing, but I can
> >> never get the docker tests to pass. If anyone knows a special profile
> >> or package that needs to be enabled, I can try that and/or
> >> fix/document it. Just wondering if it's me.
> >>
> >> I'm on Java 7 + Ubuntu 15.10, with -Pyarn -Phive -Phive-thriftserver
> >> -Phadoop-2.6
> >>
> >> On Wed, Dec 16, 2015 at 9:32 PM, Michael Armbrust
> >>  wrote:
> >> > Please vote on releasing the following candidate as Apache Spark
> version
> >> > 1.6.0!
> >> >
> >> > The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
> >> > passes
> >> > if a majority of at least 3 +1 PMC votes are cast.
> >> >
> >> > [ ] +1 Release this package as Apache Spark 1.6.0
> >> > [ ] -1 Do not release this package because ...
> >> >
> >> > To learn more about Apache Spark, please see http://spark.apache.org/
> >> >
> >> > The tag to be voted on is v1.6.0-rc3
> >> > (168c89e07c51fa24b0bb88582c739cec0acb44d7)
> >> >
> >> > The release files, including signatures, digests, etc. can be found
> at:
> >> >
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
> >> >
> >> > Release artifacts are signed with the following key:
> >> > https://people.apache.org/keys/committer/pwendell.asc
> >> >
> >> > The staging repository for this release can be found at:
> >> >
> https://repository.apache.org/content/repositories/orgapachespark-1174/
> >> >
> >> > The test repository (versioned as v1.6.0-rc3) for this release can be
> >> > found
> >> > at:
> >> >
> https://repository.apache.org/content/repositories/orgapachespark-1173/
> >> >
> >> > The documentation corresponding to this release can be found at:
> >> >
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
> >> >
> >> > ===
> >> > == How can I help test this release? ==
> >> > ===
> >> > If you are a Spark user, you can help us test this release by taking
> an
> >> > existing Spark workload and running on this release candidate, then
> >> > reporting any regressions.
> >> >
> >> > 
> >> > == What justifies a -1 vote for this release? ==
> >> > 
> >> > This vote is happening towards the end of the 1.6 QA period, so -1
> votes
> >> > should only occur for significant regressions from 1.5. Bugs already
> >> > present
> >> > in 1.5, minor regressions, or bugs related to new features will not
> >> > block
> >> > this release.
> >> >
> >> > ===
> >> > == What should happen to JIRA tickets still targeting 1.6.0? ==
> >> > ===
> >> > 1. It is OK for documentation patches to target 1.6.0 and still go
> into
> >> > branch-1.6, since documentations will be published separately from the
> >> > release.
> >> > 2. New features for non-alpha-modules should target 1.7+.
> >> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
> >> > target
> >> > version.
> >> >
> >> >
> >> > ==
> >> 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-18 Thread Marcelo Vanzin
+1 (non-binding)

Tests the without-hadoop binaries (so didn't run Hive-related tests)
with a test batch including standalone / client, yarn / client and
cluster, including core, mllib and streaming (flume and kafka).

On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust
 wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v1.6.0-rc3
> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1174/
>
> The test repository (versioned as v1.6.0-rc3) for this release can be found
> at:
> https://repository.apache.org/content/repositories/orgapachespark-1173/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already present
> in 1.5, minor regressions, or bugs related to new features will not block
> this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Notable changes since 1.6 RC2
>
>
> - SPARK_VERSION has been set correctly
> - SPARK-12199 ML Docs are publishing correctly
> - SPARK-12345 Mesos cluster mode has been fixed
>
> Notable changes since 1.6 RC1
>
> Spark Streaming
>
> SPARK-2629  trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
> SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by execution.
> SPARK-12258 correct passing null into ScalaUDF
>
> Notable Features Since 1.5
>
> Spark SQL
>
> SPARK-11787 Parquet Performance - Improve Parquet scan performance when
> using flat schemas.
> SPARK-10810 Session Management - Isolated devault database (i.e USE mydb)
> even on shared clusters.
> SPARK-  Dataset API - A type-safe API (similar to RDDs) that performs
> many operations on serialized binary data and code generation (i.e. Project
> Tungsten).
> SPARK-1 Unified Memory Management - Shared memory for execution and
> caching instead of exclusive division of the regions.
> SPARK-11197 SQL Queries on Files - Concise syntax for running SQL queries
> over files of any supported format without registering a table.
> SPARK-11745 Reading non-standard JSON files - Added options to read
> non-standard JSON files (e.g. single-quotes, unquoted attributes)
> SPARK-10412 Per-operator Metrics for SQL Execution - Display statistics on a
> peroperator basis for memory usage and spilled data size.
> SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to nest and
> unest arbitrary numbers of columns
> SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - Significant
> (up to 14x) speed up when caching data that contains complex types in
> DataFrames or SQL.
> SPARK-1 Fast null-safe joins - Joins using null-safe equality (<=>) will
> now execute using SortMergeJoin instead of computing a cartisian product.
> SPARK-11389 SQL Execution Using Off-Heap Memory - Support for configuring
> query execution to occur using off-heap memory to avoid GC overhead
> SPARK-10978 Datasource API Avoid Double Filter - When implemeting a
> datasource with filter pushdown, developers can now tell Spark SQL to avoid
> double 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-18 Thread Denny Lee
+1 (non-binding)

Tested a number of tests surrounding DataFrames, Datasets, and ML.


On Wed, Dec 16, 2015 at 1:32 PM Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc3
> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1174/
>
> The test repository (versioned as v1.6.0-rc3) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1173/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Notable changes since 1.6 RC2
> - SPARK_VERSION has been set correctly
> - SPARK-12199 ML Docs are publishing correctly
> - SPARK-12345 Mesos cluster mode has been fixed
>
> Notable changes since 1.6 RC1
> Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan performance when using flat schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even on
>shared clusters.
>- SPARK-   Dataset
>API - A type-safe API (similar to RDDs) that performs many operations
>on serialized binary data and code generation (i.e. Project Tungsten).
>- SPARK-1  Unified
>Memory Management - Shared memory for execution and caching instead of
>exclusive division of the regions.
>- SPARK-11197  SQL
>Queries on Files - Concise syntax for running SQL queries over files
>of any supported format without registering a table.
>- SPARK-11745  Reading
>non-standard JSON files - Added options to read non-standard JSON
>files (e.g. single-quotes, unquoted attributes)
>- SPARK-10412  
> Per-operator
>Metrics for SQL Execution - Display statistics on a peroperator basis
>for memory usage and spilled data size.
>- SPARK-11329  Star
>(*) expansion for StructTypes - Makes it easier to nest and 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-17 Thread Krishna Sankar
+1 (non-binding, of course)

1. Compiled OSX 10.10 (Yosemite) OK Total time: 29:32 min
 mvn clean package -Pyarn -Phadoop-2.6 -DskipTests
2. Tested pyspark, mllib (iPython 4.0)
2.0 Spark version is 1.6.0
2.1. statistics (min,max,mean,Pearson,Spearman) OK
2.2. Linear/Ridge/Laso Regression OK
2.3. Decision Tree, Naive Bayes OK
2.4. KMeans OK
   Center And Scale OK
2.5. RDD operations OK
  State of the Union Texts - MapReduce, Filter,sortByKey (word count)
2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
   Model evaluation/optimization (rank, numIter, lambda) with itertools
OK
3. Scala - MLlib
3.1. statistics (min,max,mean,Pearson,Spearman) OK
3.2. LinearRegressionWithSGD OK
3.3. Decision Tree OK
3.4. KMeans OK
3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
3.6. saveAsParquetFile OK
3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile,
registerTempTable, sql OK
3.8. result = sqlContext.sql("SELECT
OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
4.0. Spark SQL from Python OK
4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'") OK
5.0. Packages
5.1. com.databricks.spark.csv - read/write OK (--packages
com.databricks:spark-csv_2.10:1.3.0)
6.0. DataFrames
6.1. cast,dtypes OK
6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK
6.3. All joins,sql,set operations,udf OK

Cheers & Good work guys


On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc3
> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1174/
>
> The test repository (versioned as v1.6.0-rc3) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1173/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Notable changes since 1.6 RC2
> - SPARK_VERSION has been set correctly
> - SPARK-12199 ML Docs are publishing correctly
> - SPARK-12345 Mesos cluster mode has been fixed
>
> Notable changes since 1.6 RC1
> Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-17 Thread Vinay Shukla
Agree with Andrew, we shouldn't block the release for this.

This issue won't be there in Spark distribution from Hortonworks since we
set the HDP version.

If you want to use the Apache Spark with HDP  you can modify mapred-site.xml
to replace the hdp.version property with the right value for your cluster.
You can find the right value by invoking the hdp-select script on a node
that has HDP installed. On my system running it returns the following:
hdp-select status hadoop-client
hadoop-client - 2.2.5.0-2644
Here is a one line script to get the version:
export HDP_VER=`hdp-select status hadoop-client | sed 's/hadoop-client -
\(.*\)/\1/'`

CAUTION - if you modify mapred-site.xml on a node on the cluster, this will
break rolling upgrades in certain scenarios where a program like oozie
submitting a job from that node will use the hardcoded version instead of
the version specified by the client.

So what does the Hortonworks distribution do under the covers to support
hdp.version? 
create a file called java-opts with the following config value in it 
-Dhdp.version=2.2.5.0-2644. You can also specify the same value using
SPARK_JAVA_OPTS, i.e. export SPARK_JAVA_OPTS="-Dhdp.version=2.2.5.0-2644"
add the following options to spark-defaults.conf:
spark.driver.extraJavaOptions   -Dhdp.version=2.2.5.0-2644
spark.yarn.am.extraJavaOptions  -Dhdp.version=2.2.5.0-2644



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-6-0-RC3-tp15660p15699.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version
1.6.0!

The vote is open until Saturday, December 19, 2015 at 18:00 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.6.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is *v1.6.0-rc3
(168c89e07c51fa24b0bb88582c739cec0acb44d7)
*

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1174/

The test repository (versioned as v1.6.0-rc3) for this release can be found
at:
https://repository.apache.org/content/repositories/orgapachespark-1173/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/

===
== How can I help test this release? ==
===
If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions.


== What justifies a -1 vote for this release? ==

This vote is happening towards the end of the 1.6 QA period, so -1 votes
should only occur for significant regressions from 1.5. Bugs already
present in 1.5, minor regressions, or bugs related to new features will not
block this release.

===
== What should happen to JIRA tickets still targeting 1.6.0? ==
===
1. It is OK for documentation patches to target 1.6.0 and still go into
branch-1.6, since documentations will be published separately from the
release.
2. New features for non-alpha-modules should target 1.7+.
3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
version.


==
== Major changes to help you focus your testing ==
==

Notable changes since 1.6 RC2
- SPARK_VERSION has been set correctly
- SPARK-12199 ML Docs are publishing correctly
- SPARK-12345 Mesos cluster mode has been fixed

Notable changes since 1.6 RC1
Spark Streaming

   - SPARK-2629  
   trackStateByKey has been renamed to mapWithState

Spark SQL

   - SPARK-12165 
   SPARK-12189  Fix bugs
   in eviction of storage memory by execution.
   - SPARK-12258  correct
   passing null into ScalaUDF

Notable Features Since 1.5Spark SQL

   - SPARK-11787  Parquet
   Performance - Improve Parquet scan performance when using flat schemas.
   - SPARK-10810 
   Session Management - Isolated devault database (i.e USE mydb) even on
   shared clusters.
   - SPARK-   Dataset
   API - A type-safe API (similar to RDDs) that performs many operations on
   serialized binary data and code generation (i.e. Project Tungsten).
   - SPARK-1  Unified
   Memory Management - Shared memory for execution and caching instead of
   exclusive division of the regions.
   - SPARK-11197  SQL
   Queries on Files - Concise syntax for running SQL queries over files of
   any supported format without registering a table.
   - SPARK-11745  Reading
   non-standard JSON files - Added options to read non-standard JSON files
   (e.g. single-quotes, unquoted attributes)
   - SPARK-10412 
Per-operator
   Metrics for SQL Execution - Display statistics on a peroperator basis
   for memory usage and spilled data size.
   - SPARK-11329  Star
   (*) expansion for StructTypes - Makes it easier to nest and unest
   arbitrary numbers of columns
   - SPARK-10917 ,
   SPARK-11149  In-memory
   Columnar Cache Performance - Significant (up to 14x) speed up when
   caching data that contains complex types in DataFrames or SQL.
   - SPARK-1 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Jiří Syrový
+1 Tested in standalone mode and so far seems to be fairly stable.

2015-12-16 22:32 GMT+01:00 Michael Armbrust :

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc3
> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1174/
>
> The test repository (versioned as v1.6.0-rc3) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1173/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Notable changes since 1.6 RC2
> - SPARK_VERSION has been set correctly
> - SPARK-12199 ML Docs are publishing correctly
> - SPARK-12345 Mesos cluster mode has been fixed
>
> Notable changes since 1.6 RC1
> Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan performance when using flat schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even on
>shared clusters.
>- SPARK-   Dataset
>API - A type-safe API (similar to RDDs) that performs many operations
>on serialized binary data and code generation (i.e. Project Tungsten).
>- SPARK-1  Unified
>Memory Management - Shared memory for execution and caching instead of
>exclusive division of the regions.
>- SPARK-11197  SQL
>Queries on Files - Concise syntax for running SQL queries over files
>of any supported format without registering a table.
>- SPARK-11745  Reading
>non-standard JSON files - Added options to read non-standard JSON
>files (e.g. single-quotes, unquoted attributes)
>- SPARK-10412  
> Per-operator
>Metrics for SQL Execution - Display statistics on a peroperator basis
>for memory usage and spilled data size.
>- SPARK-11329  Star
>(*) expansion for StructTypes - Makes it easier to nest and unest
>arbitrary numbers of 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread singinpirate
-0 (non-binding)

I have observed that when we set spark.executor.port in 1.6, we get thrown
a NPE in SparkEnv$.create(SparkEnv.scala:259). It used to work in 1.5.2. Is
anyone else seeing this?

On Wed, Dec 16, 2015 at 2:26 PM Jiří Syrový  wrote:

> +1 Tested in standalone mode and so far seems to be fairly stable.
>
> 2015-12-16 22:32 GMT+01:00 Michael Armbrust :
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.6.0!
>>
>> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.6.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v1.6.0-rc3
>> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
>> *
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1174/
>>
>> The test repository (versioned as v1.6.0-rc3) for this release can be
>> found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1173/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>>
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> should only occur for significant regressions from 1.5. Bugs already
>> present in 1.5, minor regressions, or bugs related to new features will not
>> block this release.
>>
>> ===
>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>> ===
>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>> branch-1.6, since documentations will be published separately from the
>> release.
>> 2. New features for non-alpha-modules should target 1.7+.
>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
>> version.
>>
>>
>> ==
>> == Major changes to help you focus your testing ==
>> ==
>>
>> Notable changes since 1.6 RC2
>> - SPARK_VERSION has been set correctly
>> - SPARK-12199 ML Docs are publishing correctly
>> - SPARK-12345 Mesos cluster mode has been fixed
>>
>> Notable changes since 1.6 RC1
>> Spark Streaming
>>
>>- SPARK-2629  
>>trackStateByKey has been renamed to mapWithState
>>
>> Spark SQL
>>
>>- SPARK-12165 
>>SPARK-12189  Fix
>>bugs in eviction of storage memory by execution.
>>- SPARK-12258  correct
>>passing null into ScalaUDF
>>
>> Notable Features Since 1.5Spark SQL
>>
>>- SPARK-11787  Parquet
>>Performance - Improve Parquet scan performance when using flat
>>schemas.
>>- SPARK-10810 
>>Session Management - Isolated devault database (i.e USE mydb) even on
>>shared clusters.
>>- SPARK-   Dataset
>>API - A type-safe API (similar to RDDs) that performs many operations
>>on serialized binary data and code generation (i.e. Project Tungsten).
>>- SPARK-1  Unified
>>Memory Management - Shared memory for execution and caching instead
>>of exclusive division of the regions.
>>- SPARK-11197  SQL
>>Queries on Files - Concise syntax for running SQL queries over files
>>of any supported format without registering a table.
>>- SPARK-11745  Reading
>>non-standard JSON files - Added options to read non-standard JSON
>>files (e.g. single-quotes, unquoted 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Marcelo Vanzin
I was going to say that spark.executor.port is not used anymore in
1.6, but damn, there's still that akka backend hanging around there
even when netty is being used... we should fix this, should be a
simple one-liner.

On Wed, Dec 16, 2015 at 2:35 PM, singinpirate  wrote:
> -0 (non-binding)
>
> I have observed that when we set spark.executor.port in 1.6, we get thrown a
> NPE in SparkEnv$.create(SparkEnv.scala:259). It used to work in 1.5.2. Is
> anyone else seeing this?

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Rad Gruchalski
I also noticed that spark.replClassServer.host and spark.replClassServer.port 
aren’t used anymore. The transport now happens over the main RpcEnv.










Kind regards,

Radek Gruchalski

ra...@gruchalski.com (mailto:ra...@gruchalski.com)
 
(mailto:ra...@gruchalski.com)
de.linkedin.com/in/radgruchalski/ (http://de.linkedin.com/in/radgruchalski/)

Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.



On Wednesday, 16 December 2015 at 23:43, Marcelo Vanzin wrote:

> I was going to say that spark.executor.port is not used anymore in
> 1.6, but damn, there's still that akka backend hanging around there
> even when netty is being used... we should fix this, should be a
> simple one-liner.
>  
> On Wed, Dec 16, 2015 at 2:35 PM, singinpirate  (mailto:thesinginpir...@gmail.com)> wrote:
> > -0 (non-binding)
> >  
> > I have observed that when we set spark.executor.port in 1.6, we get thrown a
> > NPE in SparkEnv$.create(SparkEnv.scala:259). It used to work in 1.5.2. Is
> > anyone else seeing this?
> >  
>  
>  
> --  
> Marcelo
>  
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
> (mailto:dev-unsubscr...@spark.apache.org)
> For additional commands, e-mail: dev-h...@spark.apache.org 
> (mailto:dev-h...@spark.apache.org)
>  
>  




Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Jean-Baptiste Onofré

+1 (non binding)

Tested in standalone and yarn with different samples.

Regards
JB

On 12/16/2015 10:32 PM, Michael Armbrust wrote:

Please vote on releasing the following candidate as Apache Spark version
1.6.0!

The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.6.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is _v1.6.0-rc3
(168c89e07c51fa24b0bb88582c739cec0acb44d7)
_

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1174/

The test repository (versioned as v1.6.0-rc3) for this release can be
found at:
https://repository.apache.org/content/repositories/orgapachespark-1173/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/

===
== How can I help test this release? ==
===
If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions.


== What justifies a -1 vote for this release? ==

This vote is happening towards the end of the 1.6 QA period, so -1 votes
should only occur for significant regressions from 1.5. Bugs already
present in 1.5, minor regressions, or bugs related to new features will
not block this release.

===
== What should happen to JIRA tickets still targeting 1.6.0? ==
===
1. It is OK for documentation patches to target 1.6.0 and still go into
branch-1.6, since documentations will be published separately from the
release.
2. New features for non-alpha-modules should target 1.7+.
3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
target version.


==
== Major changes to help you focus your testing ==
==


  Notable changes since 1.6 RC2


- SPARK_VERSION has been set correctly
- SPARK-12199 ML Docs are publishing correctly
- SPARK-12345 Mesos cluster mode has been fixed


  Notable changes since 1.6 RC1


  Spark Streaming

  * SPARK-2629 
|trackStateByKey| has been renamed to |mapWithState|


  Spark SQL

  * SPARK-12165 
SPARK-12189  Fix
bugs in eviction of storage memory by execution.
  * SPARK-12258
 correct passing
null into ScalaUDF


Notable Features Since 1.5


  Spark SQL

  * SPARK-11787 
Parquet Performance - Improve Parquet scan performance when using
flat schemas.
  * SPARK-10810
Session
Management - Isolated devault database (i.e |USE mydb|) even on
shared clusters.
  * SPARK- 
Dataset API - A type-safe API (similar to RDDs) that performs many
operations on serialized binary data and code generation (i.e.
Project Tungsten).
  * SPARK-1 
Unified Memory Management - Shared memory for execution and caching
instead of exclusive division of the regions.
  * SPARK-11197  SQL
Queries on Files - Concise syntax for running SQL queries over files
of any supported format without registering a table.
  * SPARK-11745 
Reading non-standard JSON files - Added options to read non-standard
JSON files (e.g. single-quotes, unquoted attributes)
  * SPARK-10412 
Per-operator Metrics for SQL Execution - Display statistics on a
peroperator basis for memory usage and spilled data size.
  * SPARK-11329  Star
(*) expansion for StructTypes - Makes it easier to nest and unest
arbitrary numbers of columns
  * SPARK-10917 ,
SPARK-11149 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Yin Huai
+1

On Wed, Dec 16, 2015 at 7:19 PM, Patrick Wendell  wrote:

> +1
>
> On Wed, Dec 16, 2015 at 6:15 PM, Ted Yu  wrote:
>
>> Ran test suite (minus docker-integration-tests)
>> All passed
>>
>> +1
>>
>> [INFO] Spark Project External ZeroMQ .. SUCCESS [
>> 13.647 s]
>> [INFO] Spark Project External Kafka ... SUCCESS [
>> 45.424 s]
>> [INFO] Spark Project Examples . SUCCESS
>> [02:06 min]
>> [INFO] Spark Project External Kafka Assembly .. SUCCESS [
>> 11.280 s]
>> [INFO]
>> 
>> [INFO] BUILD SUCCESS
>> [INFO]
>> 
>> [INFO] Total time: 01:49 h
>> [INFO] Finished at: 2015-12-16T17:06:58-08:00
>>
>> On Wed, Dec 16, 2015 at 4:37 PM, Andrew Or  wrote:
>>
>>> +1
>>>
>>> Mesos cluster mode regression in RC2 is now fixed (SPARK-12345
>>>  / PR10332
>>> ).
>>>
>>> Also tested on standalone client and cluster mode. No problems.
>>>
>>> 2015-12-16 15:16 GMT-08:00 Rad Gruchalski :
>>>
 I also noticed that spark.replClassServer.host and
 spark.replClassServer.port aren’t used anymore. The transport now happens
 over the main RpcEnv.

 Kind regards,
 Radek Gruchalski
 ra...@gruchalski.com 
 de.linkedin.com/in/radgruchalski/


 *Confidentiality:*This communication is intended for the above-named
 person and may be confidential and/or legally privileged.
 If it has come to you in error you must take no action based on it, nor
 must you copy or show it to anyone; please delete/destroy and inform the
 sender immediately.

 On Wednesday, 16 December 2015 at 23:43, Marcelo Vanzin wrote:

 I was going to say that spark.executor.port is not used anymore in
 1.6, but damn, there's still that akka backend hanging around there
 even when netty is being used... we should fix this, should be a
 simple one-liner.

 On Wed, Dec 16, 2015 at 2:35 PM, singinpirate <
 thesinginpir...@gmail.com> wrote:

 -0 (non-binding)

 I have observed that when we set spark.executor.port in 1.6, we get
 thrown a
 NPE in SparkEnv$.create(SparkEnv.scala:259). It used to work in 1.5.2.
 Is
 anyone else seeing this?


 --
 Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



>>>
>>
>


Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Michael Armbrust
+1

On Wed, Dec 16, 2015 at 4:37 PM, Andrew Or  wrote:

> +1
>
> Mesos cluster mode regression in RC2 is now fixed (SPARK-12345
>  / PR10332
> ).
>
> Also tested on standalone client and cluster mode. No problems.
>
> 2015-12-16 15:16 GMT-08:00 Rad Gruchalski :
>
>> I also noticed that spark.replClassServer.host and
>> spark.replClassServer.port aren’t used anymore. The transport now happens
>> over the main RpcEnv.
>>
>> Kind regards,
>> Radek Gruchalski
>> ra...@gruchalski.com 
>> de.linkedin.com/in/radgruchalski/
>>
>>
>> *Confidentiality:*This communication is intended for the above-named
>> person and may be confidential and/or legally privileged.
>> If it has come to you in error you must take no action based on it, nor
>> must you copy or show it to anyone; please delete/destroy and inform the
>> sender immediately.
>>
>> On Wednesday, 16 December 2015 at 23:43, Marcelo Vanzin wrote:
>>
>> I was going to say that spark.executor.port is not used anymore in
>> 1.6, but damn, there's still that akka backend hanging around there
>> even when netty is being used... we should fix this, should be a
>> simple one-liner.
>>
>> On Wed, Dec 16, 2015 at 2:35 PM, singinpirate 
>> wrote:
>>
>> -0 (non-binding)
>>
>> I have observed that when we set spark.executor.port in 1.6, we get
>> thrown a
>> NPE in SparkEnv$.create(SparkEnv.scala:259). It used to work in 1.5.2. Is
>> anyone else seeing this?
>>
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>>
>


Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Ted Yu
Ran test suite (minus docker-integration-tests)
All passed

+1

[INFO] Spark Project External ZeroMQ .. SUCCESS [
13.647 s]
[INFO] Spark Project External Kafka ... SUCCESS [
45.424 s]
[INFO] Spark Project Examples . SUCCESS [02:06
min]
[INFO] Spark Project External Kafka Assembly .. SUCCESS [
11.280 s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 01:49 h
[INFO] Finished at: 2015-12-16T17:06:58-08:00

On Wed, Dec 16, 2015 at 4:37 PM, Andrew Or  wrote:

> +1
>
> Mesos cluster mode regression in RC2 is now fixed (SPARK-12345
>  / PR10332
> ).
>
> Also tested on standalone client and cluster mode. No problems.
>
> 2015-12-16 15:16 GMT-08:00 Rad Gruchalski :
>
>> I also noticed that spark.replClassServer.host and
>> spark.replClassServer.port aren’t used anymore. The transport now happens
>> over the main RpcEnv.
>>
>> Kind regards,
>> Radek Gruchalski
>> ra...@gruchalski.com 
>> de.linkedin.com/in/radgruchalski/
>>
>>
>> *Confidentiality:*This communication is intended for the above-named
>> person and may be confidential and/or legally privileged.
>> If it has come to you in error you must take no action based on it, nor
>> must you copy or show it to anyone; please delete/destroy and inform the
>> sender immediately.
>>
>> On Wednesday, 16 December 2015 at 23:43, Marcelo Vanzin wrote:
>>
>> I was going to say that spark.executor.port is not used anymore in
>> 1.6, but damn, there's still that akka backend hanging around there
>> even when netty is being used... we should fix this, should be a
>> simple one-liner.
>>
>> On Wed, Dec 16, 2015 at 2:35 PM, singinpirate 
>> wrote:
>>
>> -0 (non-binding)
>>
>> I have observed that when we set spark.executor.port in 1.6, we get
>> thrown a
>> NPE in SparkEnv$.create(SparkEnv.scala:259). It used to work in 1.5.2. Is
>> anyone else seeing this?
>>
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>>
>


Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Mark Hamstra
+1

On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc3
> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1174/
>
> The test repository (versioned as v1.6.0-rc3) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1173/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Notable changes since 1.6 RC2
> - SPARK_VERSION has been set correctly
> - SPARK-12199 ML Docs are publishing correctly
> - SPARK-12345 Mesos cluster mode has been fixed
>
> Notable changes since 1.6 RC1
> Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan performance when using flat schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even on
>shared clusters.
>- SPARK-   Dataset
>API - A type-safe API (similar to RDDs) that performs many operations
>on serialized binary data and code generation (i.e. Project Tungsten).
>- SPARK-1  Unified
>Memory Management - Shared memory for execution and caching instead of
>exclusive division of the regions.
>- SPARK-11197  SQL
>Queries on Files - Concise syntax for running SQL queries over files
>of any supported format without registering a table.
>- SPARK-11745  Reading
>non-standard JSON files - Added options to read non-standard JSON
>files (e.g. single-quotes, unquoted attributes)
>- SPARK-10412  
> Per-operator
>Metrics for SQL Execution - Display statistics on a peroperator basis
>for memory usage and spilled data size.
>- SPARK-11329  Star
>(*) expansion for StructTypes - Makes it easier to nest and unest
>arbitrary numbers of columns
>- SPARK-10917 

Re:Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Allen Zhang
plus 1






在 2015-12-17 09:39:39,"Joseph Bradley"  写道:

+1


On Wed, Dec 16, 2015 at 5:26 PM, Reynold Xin  wrote:

+1




On Wed, Dec 16, 2015 at 5:24 PM, Mark Hamstra  wrote:

+1


On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust  
wrote:

Please vote on releasing the following candidate as Apache Spark version 1.6.0!

The vote is open until Saturday, December 19, 2015 at 18:00 UTC and passes if a 
majority of at least 3 +1 PMC votes are cast.



[ ] +1 Release this package as Apache Spark 1.6.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/


The tag to be voted on is v1.6.0-rc3 (168c89e07c51fa24b0bb88582c739cec0acb44d7)


The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/


Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc


The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1174/


The test repository (versioned as v1.6.0-rc3) for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1173/


The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/


===
== How can I help test this release? ==
===
If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions.



== What justifies a -1 vote for this release? ==

This vote is happening towards the end of the 1.6 QA period, so -1 votes should 
only occur for significant regressions from 1.5. Bugs already present in 1.5, 
minor regressions, or bugs related to new features will not block this release.


===
== What should happen to JIRA tickets still targeting 1.6.0? ==
===
1. It is OK for documentation patches to target 1.6.0 and still go into 
branch-1.6, since documentations will be published separately from the release.
2. New features for non-alpha-modules should target 1.7+.
3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target 
version.




==
== Major changes to help you focus your testing ==
==


Notable changes since 1.6 RC2

- SPARK_VERSION has been set correctly
- SPARK-12199 ML Docs are publishing correctly
- SPARK-12345 Mesos cluster mode has been fixed


Notable changes since 1.6 RC1

Spark Streaming
SPARK-2629  trackStateByKey has been renamed to mapWithState
Spark SQL
SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by execution.
SPARK-12258 correct passing null into ScalaUDF
Notable Features Since 1.5
Spark SQL
SPARK-11787 Parquet Performance - Improve Parquet scan performance when using 
flat schemas.
SPARK-10810 Session Management - Isolated devault database (i.e USE mydb) even 
on shared clusters.
SPARK-  Dataset API - A type-safe API (similar to RDDs) that performs many 
operations on serialized binary data and code generation (i.e. Project 
Tungsten).
SPARK-1 Unified Memory Management - Shared memory for execution and caching 
instead of exclusive division of the regions.
SPARK-11197 SQL Queries on Files - Concise syntax for running SQL queries over 
files of any supported format without registering a table.
SPARK-11745 Reading non-standard JSON files - Added options to read 
non-standard JSON files (e.g. single-quotes, unquoted attributes)
SPARK-10412 Per-operator Metrics for SQL Execution - Display statistics on a 
peroperator basis for memory usage and spilled data size.
SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to nest and 
unest arbitrary numbers of columns
SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - Significant (up 
to 14x) speed up when caching data that contains complex types in DataFrames or 
SQL.
SPARK-1 Fast null-safe joins - Joins using null-safe equality (<=>) will 
now execute using SortMergeJoin instead of computing a cartisian product.
SPARK-11389 SQL Execution Using Off-Heap Memory - Support for configuring query 
execution to occur using off-heap memory to avoid GC overhead
SPARK-10978 Datasource API Avoid Double Filter - When implemeting a datasource 
with filter pushdown, developers can now tell Spark SQL to avoid double 
evaluating a pushed-down filter.
SPARK-4849  Advanced Layout of Cached Data - storing partitioning and ordering 
schemes in 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Joseph Bradley
+1

On Wed, Dec 16, 2015 at 5:26 PM, Reynold Xin  wrote:

> +1
>
>
> On Wed, Dec 16, 2015 at 5:24 PM, Mark Hamstra 
> wrote:
>
>> +1
>>
>> On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust > > wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 1.6.0!
>>>
>>> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
>>> passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.6.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is *v1.6.0-rc3
>>> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
>>> *
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1174/
>>>
>>> The test repository (versioned as v1.6.0-rc3) for this release can be
>>> found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1173/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>>>
>>> ===
>>> == How can I help test this release? ==
>>> ===
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> 
>>> == What justifies a -1 vote for this release? ==
>>> 
>>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>>> should only occur for significant regressions from 1.5. Bugs already
>>> present in 1.5, minor regressions, or bugs related to new features will not
>>> block this release.
>>>
>>> ===
>>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>>> ===
>>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>>> branch-1.6, since documentations will be published separately from the
>>> release.
>>> 2. New features for non-alpha-modules should target 1.7+.
>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>> target version.
>>>
>>>
>>> ==
>>> == Major changes to help you focus your testing ==
>>> ==
>>>
>>> Notable changes since 1.6 RC2
>>> - SPARK_VERSION has been set correctly
>>> - SPARK-12199 ML Docs are publishing correctly
>>> - SPARK-12345 Mesos cluster mode has been fixed
>>>
>>> Notable changes since 1.6 RC1
>>> Spark Streaming
>>>
>>>- SPARK-2629  
>>>trackStateByKey has been renamed to mapWithState
>>>
>>> Spark SQL
>>>
>>>- SPARK-12165 
>>>SPARK-12189  Fix
>>>bugs in eviction of storage memory by execution.
>>>- SPARK-12258  correct
>>>passing null into ScalaUDF
>>>
>>> Notable Features Since 1.5Spark SQL
>>>
>>>- SPARK-11787  Parquet
>>>Performance - Improve Parquet scan performance when using flat
>>>schemas.
>>>- SPARK-10810 
>>>Session Management - Isolated devault database (i.e USE mydb) even
>>>on shared clusters.
>>>- SPARK-   Dataset
>>>API - A type-safe API (similar to RDDs) that performs many
>>>operations on serialized binary data and code generation (i.e. Project
>>>Tungsten).
>>>- SPARK-1  Unified
>>>Memory Management - Shared memory for execution and caching instead
>>>of exclusive division of the regions.
>>>- SPARK-11197  SQL
>>>Queries on Files - Concise syntax for running SQL queries over files
>>>of any supported format without registering a table.
>>>- SPARK-11745  Reading
>>>non-standard JSON files - Added options to read non-standard JSON
>>>files (e.g. single-quotes, unquoted attributes)
>>>- SPARK-10412 

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Patrick Wendell
+1

On Wed, Dec 16, 2015 at 6:15 PM, Ted Yu  wrote:

> Ran test suite (minus docker-integration-tests)
> All passed
>
> +1
>
> [INFO] Spark Project External ZeroMQ .. SUCCESS [
> 13.647 s]
> [INFO] Spark Project External Kafka ... SUCCESS [
> 45.424 s]
> [INFO] Spark Project Examples . SUCCESS [02:06
> min]
> [INFO] Spark Project External Kafka Assembly .. SUCCESS [
> 11.280 s]
> [INFO]
> 
> [INFO] BUILD SUCCESS
> [INFO]
> 
> [INFO] Total time: 01:49 h
> [INFO] Finished at: 2015-12-16T17:06:58-08:00
>
> On Wed, Dec 16, 2015 at 4:37 PM, Andrew Or  wrote:
>
>> +1
>>
>> Mesos cluster mode regression in RC2 is now fixed (SPARK-12345
>>  / PR10332
>> ).
>>
>> Also tested on standalone client and cluster mode. No problems.
>>
>> 2015-12-16 15:16 GMT-08:00 Rad Gruchalski :
>>
>>> I also noticed that spark.replClassServer.host and
>>> spark.replClassServer.port aren’t used anymore. The transport now happens
>>> over the main RpcEnv.
>>>
>>> Kind regards,
>>> Radek Gruchalski
>>> ra...@gruchalski.com 
>>> de.linkedin.com/in/radgruchalski/
>>>
>>>
>>> *Confidentiality:*This communication is intended for the above-named
>>> person and may be confidential and/or legally privileged.
>>> If it has come to you in error you must take no action based on it, nor
>>> must you copy or show it to anyone; please delete/destroy and inform the
>>> sender immediately.
>>>
>>> On Wednesday, 16 December 2015 at 23:43, Marcelo Vanzin wrote:
>>>
>>> I was going to say that spark.executor.port is not used anymore in
>>> 1.6, but damn, there's still that akka backend hanging around there
>>> even when netty is being used... we should fix this, should be a
>>> simple one-liner.
>>>
>>> On Wed, Dec 16, 2015 at 2:35 PM, singinpirate 
>>> wrote:
>>>
>>> -0 (non-binding)
>>>
>>> I have observed that when we set spark.executor.port in 1.6, we get
>>> thrown a
>>> NPE in SparkEnv$.create(SparkEnv.scala:259). It used to work in 1.5.2. Is
>>> anyone else seeing this?
>>>
>>>
>>> --
>>> Marcelo
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>>
>>
>