svn commit: r1675040 - /spark/site/mllib/index.html

2015-04-21 Thread meng
Author: meng Date: Tue Apr 21 06:25:38 2015 New Revision: 1675040 URL: http://svn.apache.org/r1675040 Log: update mllib/index.html Modified: spark/site/mllib/index.html Modified: spark/site/mllib/index.html URL:

spark git commit: [SPARK-6490][Core] Add spark.rpc.* and deprecate spark.akka.*

2015-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master c736220da - 8136810df [SPARK-6490][Core] Add spark.rpc.* and deprecate spark.akka.* Deprecated `spark.akka.num.retries`, `spark.akka.retry.wait`, `spark.akka.askTimeout`, `spark.akka.lookupTimeout`, and added `spark.rpc.num.retries`,

spark git commit: [SPARK-5990] [MLLIB] Model import/export for IsotonicRegression

2015-04-21 Thread meng
Repository: spark Updated Branches: refs/heads/master ab9128fb7 - 1f2f723b0 [SPARK-5990] [MLLIB] Model import/export for IsotonicRegression Model import/export for IsotonicRegression Author: Yanbo Liang yblia...@gmail.com Closes #5270 from yanboliang/spark-5990 and squashes the following

spark git commit: [SPARK-6949] [SQL] [PySpark] Support Date/Timestamp in Column expression

2015-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 8136810df - ab9128fb7 [SPARK-6949] [SQL] [PySpark] Support Date/Timestamp in Column expression This PR enable auto_convert in JavaGateway, then we could register a converter for a given types, for example, date and datetime. There are

spark git commit: [SPARK-5360] [SPARK-6606] Eliminate duplicate objects in serialized CoGroupedRDD

2015-04-21 Thread kayousterhout
Repository: spark Updated Branches: refs/heads/master 5fea3e5c3 - c035c0f2d [SPARK-5360] [SPARK-6606] Eliminate duplicate objects in serialized CoGroupedRDD CoGroupPartition, part of CoGroupedRDD, includes references to each RDD that the CoGroupedRDD narrowly depends on, and a reference to

svn commit: r8672 - /dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz.asc /release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz.asc

2015-04-21 Thread pwendell
Author: pwendell Date: Tue Apr 21 18:51:16 2015 New Revision: 8672 Log: Spark 1.2.2 Hadoop 2.4 ASC Added: release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz.asc - copied unchanged from r8671, dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz.asc Removed:

svn commit: r8673 - /dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz.md5 /release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz.md5

2015-04-21 Thread pwendell
Author: pwendell Date: Tue Apr 21 18:51:32 2015 New Revision: 8673 Log: Spark 1.2.2 Hadoop 2.4 MD5 Added: release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz.md5 - copied unchanged from r8672, dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz.md5 Removed:

Git Push Summary

2015-04-21 Thread pwendell
Repository: spark Updated Tags: refs/tags/v1.2.2-rc1 [created] 7531b50e4 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r8674 - /dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz.sha /release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz.sha

2015-04-21 Thread pwendell
Author: pwendell Date: Tue Apr 21 18:52:08 2015 New Revision: 8674 Log: Spark 1.2.2 Hadoop 2.4 Sha Added: release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz.sha - copied unchanged from r8673, dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz.sha Removed:

svn commit: r8671 - /dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz /release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz

2015-04-21 Thread pwendell
Author: pwendell Date: Tue Apr 21 18:50:59 2015 New Revision: 8671 Log: Spark 1.2.2 Hadoop 2.4 TGZ Added: release/spark/spark-1.2.2/spark-1.2.2-bin-hadoop2.4.tgz - copied unchanged from r8670, dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz Removed:

svn commit: r8670 - in /dev/spark/spark-1.2.2-rc1: ./ spark-1.2.2-bin-hadoop2.4.tgz spark-1.2.2-bin-hadoop2.4.tgz.asc spark-1.2.2-bin-hadoop2.4.tgz.md5 spark-1.2.2-bin-hadoop2.4.tgz.sha

2015-04-21 Thread pwendell
Author: pwendell Date: Tue Apr 21 18:49:30 2015 New Revision: 8670 Log: Adding missing Hadoop 2.4 binary for Spark 1.2.2 Added: dev/spark/spark-1.2.2-rc1/ dev/spark/spark-1.2.2-rc1/spark-1.2.2-bin-hadoop2.4.tgz (with props)

spark git commit: [SPARK-6996][SQL] Support map types in java beans

2015-04-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 6265cba00 - 2a24bf92e [SPARK-6996][SQL] Support map types in java beans liancheng mengxr this is similar to #5146. Author: Punya Biswal pbis...@palantir.com Closes #5578 from punya/feature/SPARK-6996 and squashes the following commits:

spark git commit: SPARK-3276 Added a new configuration spark.streaming.minRememberDuration

2015-04-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master c035c0f2d - c25ca7c5a SPARK-3276 Added a new configuration spark.streaming.minRememberDuration SPARK-3276 Added a new configuration parameter ``spark.streaming.minRememberDuration``, with a default value of 1 minute. So that when a Spark

spark git commit: [SPARK-7011] Build(compilation) fails with scala 2.11 option, because a protected[sql] type is accessed in ml package.

2015-04-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 45c47fa41 - 04bf34e34 [SPARK-7011] Build(compilation) fails with scala 2.11 option, because a protected[sql] type is accessed in ml package.

spark git commit: [SPARK-5817] [SQL] Fix bug of udtf with column names

2015-04-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 2a24bf92e - 7662ec23b [SPARK-5817] [SQL] Fix bug of udtf with column names It's a bug while do query like: ```sql select d from (select explode(array(1,1)) d from src limit 1) t ``` And it will throws exception like: ```

spark git commit: [SPARK-6845] [MLlib] [PySpark] Add isTranposed flag to DenseMatrix

2015-04-21 Thread meng
Repository: spark Updated Branches: refs/heads/master c25ca7c5a - 45c47fa41 [SPARK-6845] [MLlib] [PySpark] Add isTranposed flag to DenseMatrix Since sparse matrices now support a isTransposed flag for row major data, DenseMatrices should do the same. Author: MechCoder

spark git commit: [SPARK-6969][SQL] Refresh the cached table when REFRESH TABLE is used

2015-04-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 03fd92167 - 6265cba00 [SPARK-6969][SQL] Refresh the cached table when REFRESH TABLE is used https://issues.apache.org/jira/browse/SPARK-6969 Author: Yin Huai yh...@databricks.com Closes #5583 from yhuai/refreshTableRefreshDataCache and

spark git commit: [SQL][minor] make it more clear that we only need to re-throw GetField exception for UnresolvedAttribute

2015-04-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 2e8c6ca47 - 03fd92167 [SQL][minor] make it more clear that we only need to re-throw GetField exception for UnresolvedAttribute For `GetField` outside `UnresolvedAttribute`, we will throw exception in `Analyzer`. Author: Wenchen Fan

spark git commit: [SPARK-6994] Allow to fetch field values by name in sql.Row

2015-04-21 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 04bf34e34 - 2e8c6ca47 [SPARK-6994] Allow to fetch field values by name in sql.Row It looked weird that up to now there was no way in Spark's Scala API to access fields of `DataFrame/sql.Row` by name, only by their index. This tries to

spark git commit: [SPARK-3386] Share and reuse SerializerInstances in shuffle paths

2015-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7662ec23b - f83c0f112 [SPARK-3386] Share and reuse SerializerInstances in shuffle paths This patch modifies several shuffle-related code paths to share and re-use SerializerInstances instead of creating new ones. Some serializers, such

spark git commit: [minor] [build] Set java options when generating mima ignores.

2015-04-21 Thread pwendell
Repository: spark Updated Branches: refs/heads/master f83c0f112 - a70e849c7 [minor] [build] Set java options when generating mima ignores. The default java options make the call to GenerateMIMAIgnore take forever to run since it's gc'ing all the time. Improve things by setting the perm gen

spark git commit: [SPARK-6065] [MLlib] Optimize word2vec.findSynonyms using blas calls

2015-04-21 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master a70e849c7 - 7fe6142cd [SPARK-6065] [MLlib] Optimize word2vec.findSynonyms using blas calls 1. Use blas calls to find the dot product between two vectors. 2. Prevent re-computing the L2 norm of the given vector for each word in model.

spark git commit: [SPARK-6490][Docs] Add docs for rpc configurations

2015-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master a0761ec70 - 3a3f7100f [SPARK-6490][Docs] Add docs for rpc configurations Added docs for rpc configurations and also fixed two places that should have been fixed in #5595. Author: zsxwing zsxw...@gmail.com Closes #5607 from

spark git commit: [MINOR] Comment improvements in ExternalSorter.

2015-04-21 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 3a3f7100f - 70f9f8ff3 [MINOR] Comment improvements in ExternalSorter. 1. Clearly specifies the contract/interactions for users of this class. 2. Minor fix in one doc to avoid ambiguity. Author: Patrick Wendell patr...@databricks.com

spark git commit: [SPARK-6889] [DOCS] CONTRIBUTING.md updates to accompany contribution doc updates

2015-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 607eff0ed - bdc5c16e7 [SPARK-6889] [DOCS] CONTRIBUTING.md updates to accompany contribution doc updates Part of the SPARK-6889 doc updates, to accompany wiki updates at

spark git commit: [SPARK-6113] [ML] Small cleanups after original tree API PR

2015-04-21 Thread meng
Repository: spark Updated Branches: refs/heads/master 70f9f8ff3 - 607eff0ed [SPARK-6113] [ML] Small cleanups after original tree API PR This does a few clean-ups. With this PR, all spark.ml tree components have ```private[ml]``` constructors. CC: mengxr Author: Joseph K. Bradley

spark git commit: [Minor][MLLIB] Fix a minor formatting bug in toString method in Node.scala

2015-04-21 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.3 fd61820d3 - 4508f0189 [Minor][MLLIB] Fix a minor formatting bug in toString method in Node.scala add missing comma and space Author: Alain a...@usc.edu Closes #5621 from AiHe/tree-node-issue and squashes the following commits:

spark git commit: [SPARK-6014] [core] Revamp Spark shutdown hooks, fix shutdown races.

2015-04-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master b063a61b9 - e72c16e30 [SPARK-6014] [core] Revamp Spark shutdown hooks, fix shutdown races. This change adds some new utility code to handle shutdown hooks in Spark. The main goal is to take advantage of Hadoop 2.x's API for shutdown hooks,

spark git commit: Closes #5427

2015-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 3134c3fe4 - 41ef78a94 Closes #5427 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/41ef78a9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/41ef78a9 Diff:

spark git commit: [SPARK-1684] [PROJECT INFRA] Merge script should standardize SPARK-XXX prefix

2015-04-21 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 41ef78a94 - a0761ec70 [SPARK-1684] [PROJECT INFRA] Merge script should standardize SPARK-XXX prefix Cleans up the pull request title in the merge script to follow conventions outlined in the wiki under Contributing Code.

spark git commit: [SPARK-7036][MLLIB] ALS.train should support DataFrames in PySpark

2015-04-21 Thread meng
Repository: spark Updated Branches: refs/heads/master 7fe6142cd - 686dd742e [SPARK-7036][MLLIB] ALS.train should support DataFrames in PySpark SchemaRDD works with ALS.train in 1.2, so we should continue support DataFrames for compatibility. coderxiang Author: Xiangrui Meng

spark git commit: [SPARK-7036][MLLIB] ALS.train should support DataFrames in PySpark

2015-04-21 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.3 948f2f635 - fd61820d3 [SPARK-7036][MLLIB] ALS.train should support DataFrames in PySpark SchemaRDD works with ALS.train in 1.2, so we should continue support DataFrames for compatibility. coderxiang Author: Xiangrui Meng

spark git commit: [SPARK-6953] [PySpark] speed up python tests

2015-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master e72c16e30 - 3134c3fe4 [SPARK-6953] [PySpark] speed up python tests This PR try to speed up some python tests: ``` tests.py 144s - 103s -41s mllib/classification.py 24s - 17s-7s

spark git commit: [SPARK-6985][streaming] Receiver maxRate over 1000 causes a StackOverflowError

2015-04-21 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-1.3 8549ff4f0 - 948f2f635 [SPARK-6985][streaming] Receiver maxRate over 1000 causes a StackOverflowError A simple truncation in integer division (on rates over 1000 messages / second) causes the existing implementation to sleep for 0

spark git commit: [SPARK-6985][streaming] Receiver maxRate over 1000 causes a StackOverflowError

2015-04-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master 1f2f723b0 - 5fea3e5c3 [SPARK-6985][streaming] Receiver maxRate over 1000 causes a StackOverflowError A simple truncation in integer division (on rates over 1000 messages / second) causes the existing implementation to sleep for 0