spark git commit: [SPARK-7320] [SQL] Add Cube / Rollup for dataframe

2015-05-20 Thread lian
Repository: spark Updated Branches: refs/heads/master b3abf0b8d - 09265ad7c [SPARK-7320] [SQL] Add Cube / Rollup for dataframe Add `cube` `rollup` for DataFrame For example: ```scala testData.rollup($a + $b, $b).agg(sum($a - $b)) testData.cube($a + $b, $b).agg(sum($a - $b)) ``` Author:

spark git commit: Revert [SPARK-7320] [SQL] Add Cube / Rollup for dataframe

2015-05-20 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 829f1d95b - 6338c40da Revert [SPARK-7320] [SQL] Add Cube / Rollup for dataframe This reverts commit 10698e1131f665addb454cd498669920699a91b2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-7579] [ML] [DOC] User guide update for OneHotEncoder

2015-05-20 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 2ad4837cf - 829f1d95b [SPARK-7579] [ML] [DOC] User guide update for OneHotEncoder Author: Sandy Ryza sa...@cloudera.com Closes #6126 from sryza/sandy-spark-7579 and squashes the following commits: 5af803d [Sandy Ryza] SPARK-7579 [MLLIB]

spark git commit: Revert [SPARK-7320] [SQL] Add Cube / Rollup for dataframe

2015-05-20 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.4 ae8a854ca - f84bdbce8 Revert [SPARK-7320] [SQL] Add Cube / Rollup for dataframe This reverts commit 10698e1131f665addb454cd498669920699a91b2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-7579] [ML] [DOC] User guide update for OneHotEncoder

2015-05-20 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-1.4 b40c5ed7a - ae8a854ca [SPARK-7579] [ML] [DOC] User guide update for OneHotEncoder Author: Sandy Ryza sa...@cloudera.com Closes #6126 from sryza/sandy-spark-7579 and squashes the following commits: 5af803d [Sandy Ryza] SPARK-7579

spark git commit: [SPARK-7713] [SQL] Use shared broadcast hadoop conf for partitioned table scan.

2015-05-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 98a46f9df - b631bf73b [SPARK-7713] [SQL] Use shared broadcast hadoop conf for partitioned table scan. https://issues.apache.org/jira/browse/SPARK-7713 I tested the performance with the following code: ```scala import sqlContext._ import

spark git commit: [SPARK-7713] [SQL] Use shared broadcast hadoop conf for partitioned table scan.

2015-05-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 606ae3e10 - 55bd1bb52 [SPARK-7713] [SQL] Use shared broadcast hadoop conf for partitioned table scan. https://issues.apache.org/jira/browse/SPARK-7713 I tested the performance with the following code: ```scala import sqlContext._

spark git commit: [SPARK-7537] [MLLIB] spark.mllib API updates

2015-05-20 Thread meng
Repository: spark Updated Branches: refs/heads/master b631bf73b - 2ad4837cf [SPARK-7537] [MLLIB] spark.mllib API updates Minor updates to the spark.mllib APIs: 1. Add `DeveloperApi` to `PMMLExportable` and add `Experimental` to `toPMML` methods. 2. Mention `RankingMetrics.of` in the

spark git commit: [SPARK-7537] [MLLIB] spark.mllib API updates

2015-05-20 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.4 55bd1bb52 - b40c5ed7a [SPARK-7537] [MLLIB] spark.mllib API updates Minor updates to the spark.mllib APIs: 1. Add `DeveloperApi` to `PMMLExportable` and add `Experimental` to `toPMML` methods. 2. Mention `RankingMetrics.of` in the

spark git commit: [SPARK-7511] [MLLIB] pyspark ml seed param should be random by default or 42 is quite funny but not very random

2015-05-20 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 6338c40da - 191ee4745 [SPARK-7511] [MLLIB] pyspark ml seed param should be random by default or 42 is quite funny but not very random Author: Holden Karau hol...@pigscanfly.ca Closes #6139 from

spark git commit: [SPARK-7511] [MLLIB] pyspark ml seed param should be random by default or 42 is quite funny but not very random

2015-05-20 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-1.4 f84bdbce8 - 096cb127a [SPARK-7511] [MLLIB] pyspark ml seed param should be random by default or 42 is quite funny but not very random Author: Holden Karau hol...@pigscanfly.ca Closes #6139 from

[1/2] spark git commit: Preparing Spark release rc-test

2015-05-20 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.4 a502e4b84 - 205ed15f2 Preparing Spark release rc-test Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/09a1c623 Tree:

Git Push Summary

2015-05-20 Thread pwendell
Repository: spark Updated Tags: refs/tags/rc-test [created] 09a1c6231 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

[2/2] spark git commit: Preparing development version 1.4.0-SNAPSHOT

2015-05-20 Thread pwendell
Preparing development version 1.4.0-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/205ed15f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/205ed15f Diff:

Git Push Summary

2015-05-20 Thread pwendell
Repository: spark Updated Tags: refs/tags/rc-test [deleted] 5f4d87f60 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-7762] [MLLIB] set default value for outputCol

2015-05-20 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-1.4 8d6684986 - 5f64269c5 [SPARK-7762] [MLLIB] set default value for outputCol Set a default value for `outputCol` instead of forcing users to name it. This is useful for intermediate transformers in the pipeline. jkbradley Author:

[2/2] spark git commit: Preparing development version 1.4.0-SNAPSHOT

2015-05-20 Thread pwendell
Preparing development version 1.4.0-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8d668498 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8d668498 Diff:

Git Push Summary

2015-05-20 Thread pwendell
Repository: spark Updated Tags: refs/tags/rc-test [created] ae29aeaf8 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-7762] [MLLIB] set default value for outputCol

2015-05-20 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master f2faa7af3 - c330e52da [SPARK-7762] [MLLIB] set default value for outputCol Set a default value for `outputCol` instead of forcing users to name it. This is useful for intermediate transformers in the pipeline. jkbradley Author: Xiangrui

[1/2] spark git commit: Preparing Spark release rc-test

2015-05-20 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.4 534c787b9 - 8d6684986 Preparing Spark release rc-test Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ae29aeaf Tree:

Git Push Summary

2015-05-20 Thread pwendell
Repository: spark Updated Tags: refs/tags/rc-test [deleted] ae29aeaf8 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-7251] Perform sequential scan when iterating over BytesToBytesMap

2015-05-20 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 7956dd7ab - f2faa7af3 [SPARK-7251] Perform sequential scan when iterating over BytesToBytesMap This patch modifies `BytesToBytesMap.iterator()` to iterate through records in the order that they appear in the data pages rather than

spark git commit: [SPARK-7719] Re-add UnsafeShuffleWriterSuite test that was removed for Java 6 compat

2015-05-20 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master c330e52da - 5196efff5 [SPARK-7719] Re-add UnsafeShuffleWriterSuite test that was removed for Java 6 compat This patch re-adds a test which was removed in 9ebb44f8abb1a13f045eed60190954db904ffef7 due to a Java 6 compatibility issue. We

spark git commit: [SPARK-7719] Re-add UnsafeShuffleWriterSuite test that was removed for Java 6 compat

2015-05-20 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 9b37e32c5 - e1f7de33b [SPARK-7719] Re-add UnsafeShuffleWriterSuite test that was removed for Java 6 compat This patch re-adds a test which was removed in 9ebb44f8abb1a13f045eed60190954db904ffef7 due to a Java 6 compatibility issue.

spark git commit: [SPARK-7767] [STREAMING] Added test for checkpoint serialization in StreamingContext.start()

2015-05-20 Thread tdas
Repository: spark Updated Branches: refs/heads/master 9b84443dd - 3c434cbfd [SPARK-7767] [STREAMING] Added test for checkpoint serialization in StreamingContext.start() Currently, the background checkpointing thread fails silently if the checkpoint is not serializable. It is hard to debug

spark git commit: [SPARK-7767] [STREAMING] Added test for checkpoint serialization in StreamingContext.start()

2015-05-20 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 23356dd0d - a502e4b84 [SPARK-7767] [STREAMING] Added test for checkpoint serialization in StreamingContext.start() Currently, the background checkpointing thread fails silently if the checkpoint is not serializable. It is hard to

spark git commit: [SPARK-7698] Cache and reuse buffers in ExecutorMemoryAllocator when using heap allocation

2015-05-20 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 205ed15f2 - 7cea552e1 [SPARK-7698] Cache and reuse buffers in ExecutorMemoryAllocator when using heap allocation When on-heap memory allocation is used, ExecutorMemoryManager should maintain a cache / pool of buffers for re-use by

spark git commit: [SPARK-7698] Cache and reuse buffers in ExecutorMemoryAllocator when using heap allocation

2015-05-20 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 3c434cbfd - 7956dd7ab [SPARK-7698] Cache and reuse buffers in ExecutorMemoryAllocator when using heap allocation When on-heap memory allocation is used, ExecutorMemoryManager should maintain a cache / pool of buffers for re-use by tasks.

Git Push Summary

2015-05-20 Thread pwendell
Repository: spark Updated Tags: refs/tags/rc-test [deleted] 1e458e355 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

[2/2] spark git commit: Preparing development version 1.4.0-SNAPSHOT

2015-05-20 Thread pwendell
Preparing development version 1.4.0-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9b37e32c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9b37e32c Diff:

Git Push Summary

2015-05-20 Thread pwendell
Repository: spark Updated Tags: refs/tags/rc-test [created] 1e458e355 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

[1/2] spark git commit: Preparing Spark release rc-test

2015-05-20 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.4 5f64269c5 - 9b37e32c5 Preparing Spark release rc-test Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1e458e35 Tree:

spark git commit: [SPARK-7237] [SPARK-7741] [CORE] [STREAMING] Clean more closures that need cleaning

2015-05-20 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 096cb127a - 23356dd0d [SPARK-7237] [SPARK-7741] [CORE] [STREAMING] Clean more closures that need cleaning SPARK-7741 is the equivalent of SPARK-7237 in streaming. This is an alternative to #6268. Author: Andrew Or

spark git commit: [SPARK-7237] [SPARK-7741] [CORE] [STREAMING] Clean more closures that need cleaning

2015-05-20 Thread tdas
Repository: spark Updated Branches: refs/heads/master 191ee4745 - 9b84443dd [SPARK-7237] [SPARK-7741] [CORE] [STREAMING] Clean more closures that need cleaning SPARK-7741 is the equivalent of SPARK-7237 in streaming. This is an alternative to #6268. Author: Andrew Or and...@databricks.com

[2/2] spark git commit: Preparing development version 1.4.0-SNAPSHOT

2015-05-20 Thread pwendell
Preparing development version 1.4.0-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/534c787b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/534c787b Diff:

[1/2] spark git commit: Preparing Spark release rc-test

2015-05-20 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.4 82bc518cf - 534c787b9 Preparing Spark release rc-test Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5f4d87f6 Tree:

Git Push Summary

2015-05-20 Thread pwendell
Repository: spark Updated Tags: refs/tags/rc-test [created] 5f4d87f60 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-7746][SQL] Add FetchSize parameter for JDBC driver

2015-05-20 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.4 9711e9bf1 - e70be6987 [SPARK-7746][SQL] Add FetchSize parameter for JDBC driver JIRA: https://issues.apache.org/jira/browse/SPARK-7746 Looks like an easy to add parameter but can show significant performance improvement if the JDBC

spark git commit: [SPARK-7777] [STREAMING] Fix the flaky test in org.apache.spark.streaming.BasicOperationsSuite

2015-05-20 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 0d061ff9e - b6182ce89 [SPARK-] [STREAMING] Fix the flaky test in org.apache.spark.streaming.BasicOperationsSuite Just added a guard to make sure a batch has completed before moving to the next batch. Author: zsxwing

spark git commit: [SPARK-7774] [MLLIB] add sqlContext to MLlibTestSparkContext

2015-05-20 Thread meng
Repository: spark Updated Branches: refs/heads/master 42c592adb - ddec173cb [SPARK-7774] [MLLIB] add sqlContext to MLlibTestSparkContext to simplify test suites that require a SQLContext. Author: Xiangrui Meng m...@databricks.com Closes #6303 from mengxr/SPARK-7774 and squashes the

spark git commit: [SPARK-7774] [MLLIB] add sqlContext to MLlibTestSparkContext

2015-05-20 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.4 4fd674336 - 9711e9bf1 [SPARK-7774] [MLLIB] add sqlContext to MLlibTestSparkContext to simplify test suites that require a SQLContext. Author: Xiangrui Meng m...@databricks.com Closes #6303 from mengxr/SPARK-7774 and squashes the

spark git commit: [SPARK-7746][SQL] Add FetchSize parameter for JDBC driver

2015-05-20 Thread rxin
Repository: spark Updated Branches: refs/heads/master ddec173cb - d0eb9ffe9 [SPARK-7746][SQL] Add FetchSize parameter for JDBC driver JIRA: https://issues.apache.org/jira/browse/SPARK-7746 Looks like an easy to add parameter but can show significant performance improvement if the JDBC

spark git commit: [SPARK-7750] [WEBUI] Rename endpoints from `json` to `api` to allow fu…

2015-05-20 Thread irashid
Repository: spark Updated Branches: refs/heads/branch-1.4 e1f7de33b - 0d061ff9e [SPARK-7750] [WEBUI] Rename endpoints from `json` to `api` to allow fu… …rther extension to non-json outputs too. Author: Hari Shreedharan hshreedha...@apache.org Closes #6273 from

spark git commit: [SPARK-7750] [WEBUI] Rename endpoints from `json` to `api` to allow fu…

2015-05-20 Thread irashid
Repository: spark Updated Branches: refs/heads/master 5196efff5 - a70bf06b7 [SPARK-7750] [WEBUI] Rename endpoints from `json` to `api` to allow fu… …rther extension to non-json outputs too. Author: Hari Shreedharan hshreedha...@apache.org Closes #6273 from harishreedharan/json-to-api

spark git commit: [SPARK-7320] [SQL] Add Cube / Rollup for dataframe

2015-05-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 b6182ce89 - 4fd674336 [SPARK-7320] [SQL] Add Cube / Rollup for dataframe This is a follow up for #6257, which broke the maven test. Add cube rollup for DataFrame For example: ```scala testData.rollup($a + $b, $b).agg(sum($a - $b))

spark git commit: [SPARK-7320] [SQL] Add Cube / Rollup for dataframe

2015-05-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 895baf8f7 - 42c592adb [SPARK-7320] [SQL] Add Cube / Rollup for dataframe This is a follow up for #6257, which broke the maven test. Add cube rollup for DataFrame For example: ```scala testData.rollup($a + $b, $b).agg(sum($a - $b))

spark git commit: [SPARK-7389] [CORE] Tachyon integration improvement

2015-05-20 Thread pwendell
Repository: spark Updated Branches: refs/heads/master d0eb9ffe9 - 04940c497 [SPARK-7389] [CORE] Tachyon integration improvement Two main changes: Add two functions in ExternalBlockManager, which are putValues and getValues because the implementation may not rely on the putBytes and getBytes

spark git commit: [SPARK-7654] [MLLIB] Migrate MLlib to the DataFrame reader/writer API

2015-05-20 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.4 10698e113 - 996e2d4b3 [SPARK-7654] [MLLIB] Migrate MLlib to the DataFrame reader/writer API parquetFile - read.parquet rxin Author: Xiangrui Meng m...@databricks.com Closes #6281 from mengxr/SPARK-7654 and squashes the following

spark git commit: [SPARK-6094] [MLLIB] Add MultilabelMetrics in PySpark/MLlib

2015-05-20 Thread meng
Repository: spark Updated Branches: refs/heads/master 589b12f8e - 98a46f9df [SPARK-6094] [MLLIB] Add MultilabelMetrics in PySpark/MLlib Add MultilabelMetrics in PySpark/MLlib Author: Yanbo Liang yblia...@gmail.com Closes #6276 from yanboliang/spark-6094 and squashes the following commits:

spark git commit: [SPARK-6094] [MLLIB] Add MultilabelMetrics in PySpark/MLlib

2015-05-20 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.4 996e2d4b3 - 606ae3e10 [SPARK-6094] [MLLIB] Add MultilabelMetrics in PySpark/MLlib Add MultilabelMetrics in PySpark/MLlib Author: Yanbo Liang yblia...@gmail.com Closes #6276 from yanboliang/spark-6094 and squashes the following

spark git commit: [SPARK-7654] [MLLIB] Migrate MLlib to the DataFrame reader/writer API

2015-05-20 Thread meng
Repository: spark Updated Branches: refs/heads/master 3ddf051ee - 589b12f8e [SPARK-7654] [MLLIB] Migrate MLlib to the DataFrame reader/writer API parquetFile - read.parquet rxin Author: Xiangrui Meng m...@databricks.com Closes #6281 from mengxr/SPARK-7654 and squashes the following commits: