spark git commit: [SPARK-4966][YARN]The MemoryOverhead value is setted not correctly

2014-12-29 Thread tgraves
Repository: spark
Updated Branches:
  refs/heads/branch-1.2 23d64cf08 - 2cd446a90


[SPARK-4966][YARN]The MemoryOverhead value is setted not correctly

Author: meiyoula 1039320...@qq.com

Closes #3797 from XuTingjun/MemoryOverhead and squashes the following commits:

5a780fc [meiyoula] Update ClientArguments.scala

(cherry picked from commit 14fa87bdf4b89cd392270864ee063ce01bd31887)
Signed-off-by: Thomas Graves tgra...@apache.org


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2cd446a9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2cd446a9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2cd446a9

Branch: refs/heads/branch-1.2
Commit: 2cd446a90216ac8eb19584c760685fbb470c4301
Parents: 23d64cf
Author: meiyoula 1039320...@qq.com
Authored: Mon Dec 29 08:20:30 2014 -0600
Committer: Thomas Graves tgra...@apache.org
Committed: Mon Dec 29 08:21:19 2014 -0600

--
 .../main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2cd446a9/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala
--
diff --git 
a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala 
b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala
index 4d85945..7687a9b 100644
--- 
a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala
+++ 
b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala
@@ -39,6 +39,8 @@ private[spark] class ClientArguments(args: Array[String], 
sparkConf: SparkConf)
   var appName: String = Spark
   var priority = 0
 
+  parseArgs(args.toList)
+
   // Additional memory to allocate to containers
   // For now, use driver's memory overhead as our AM container's memory 
overhead
   val amMemoryOverhead = sparkConf.getInt(spark.yarn.driver.memoryOverhead,
@@ -50,7 +52,6 @@ private[spark] class ClientArguments(args: Array[String], 
sparkConf: SparkConf)
   private val isDynamicAllocationEnabled =
 sparkConf.getBoolean(spark.dynamicAllocation.enabled, false)
 
-  parseArgs(args.toList)
   loadEnvironmentArgs()
   validateArgs()
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-4982][DOC] `spark.ui.retainedJobs` description is wrong in Spark UI configuration guide

2014-12-29 Thread joshrosen
Repository: spark
Updated Branches:
  refs/heads/branch-1.2 2cd446a90 - 76046664d


[SPARK-4982][DOC] `spark.ui.retainedJobs` description is wrong in Spark UI 
configuration guide

Author: wangxiaojing u9j...@gmail.com

Closes #3818 from wangxiaojing/SPARK-4982 and squashes the following commits:

fe2ad5f [wangxiaojing] change stages to jobs

(cherry picked from commit 6645e52580747990321e22340ae742f26d2f2504)
Signed-off-by: Josh Rosen joshro...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/76046664
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/76046664
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/76046664

Branch: refs/heads/branch-1.2
Commit: 76046664dc9bd830b10c9e4786c211b4407a81e0
Parents: 2cd446a
Author: wangxiaojing u9j...@gmail.com
Authored: Mon Dec 29 10:45:14 2014 -0800
Committer: Josh Rosen joshro...@databricks.com
Committed: Mon Dec 29 10:46:13 2014 -0800

--
 docs/configuration.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/76046664/docs/configuration.md
--
diff --git a/docs/configuration.md b/docs/configuration.md
index 60fde13..d0fbf1a 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -452,7 +452,7 @@ Apart from these, the following properties are also 
available, and may be useful
   tdcodespark.ui.retainedJobs/code/td
   td1000/td
   td
-How many stages the Spark UI and status APIs remember before garbage
+How many jobs the Spark UI and status APIs remember before garbage
 collecting.
   /td
 /tr


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem

2014-12-29 Thread joshrosen
Repository: spark
Updated Branches:
  refs/heads/master 4cef05e1c - 815de5400


[SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker 
to reduce the chance of the communicating problem

Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the 
chance of the communicating problem

Author: YanTangZhai hakeemz...@tencent.com
Author: yantangzhai tyz0...@163.com

Closes #3785 from YanTangZhai/SPARK-4946 and squashes the following commits:

9ca6541 [yantangzhai] [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in 
MapOutputTracker.askTracker to reduce the chance of the communicating problem
e4c2c0a [YanTangZhai] Merge pull request #15 from apache/master
718afeb [YanTangZhai] Merge pull request #12 from apache/master
6e643f8 [YanTangZhai] Merge pull request #11 from apache/master
e249846 [YanTangZhai] Merge pull request #10 from apache/master
d26d982 [YanTangZhai] Merge pull request #9 from apache/master
76d4027 [YanTangZhai] Merge pull request #8 from apache/master
03b62b0 [YanTangZhai] Merge pull request #7 from apache/master
8a00106 [YanTangZhai] Merge pull request #6 from apache/master
cbcba66 [YanTangZhai] Merge pull request #3 from apache/master
cdef539 [YanTangZhai] Merge pull request #1 from apache/master


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/815de540
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/815de540
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/815de540

Branch: refs/heads/master
Commit: 815de54002f9c1cfedc398e95896fa207b4a5305
Parents: 4cef05e
Author: YanTangZhai hakeemz...@tencent.com
Authored: Mon Dec 29 11:30:54 2014 -0800
Committer: Josh Rosen joshro...@databricks.com
Committed: Mon Dec 29 11:30:54 2014 -0800

--
 core/src/main/scala/org/apache/spark/MapOutputTracker.scala | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/815de540/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
--
diff --git a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala 
b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
index a074ab8..6e4edc7 100644
--- a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
+++ b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
@@ -76,6 +76,8 @@ private[spark] class MapOutputTrackerMasterActor(tracker: 
MapOutputTrackerMaster
  */
 private[spark] abstract class MapOutputTracker(conf: SparkConf) extends 
Logging {
   private val timeout = AkkaUtils.askTimeout(conf)
+  private val retryAttempts = AkkaUtils.numRetries(conf)
+  private val retryIntervalMs = AkkaUtils.retryWaitMs(conf)
 
   /** Set to the MapOutputTrackerActor living on the driver. */
   var trackerActor: ActorRef = _
@@ -108,8 +110,7 @@ private[spark] abstract class MapOutputTracker(conf: 
SparkConf) extends Logging
*/
   protected def askTracker(message: Any): Any = {
 try {
-  val future = trackerActor.ask(message)(timeout)
-  Await.result(future, timeout)
+  AkkaUtils.askWithReply(message, trackerActor, retryAttempts, 
retryIntervalMs, timeout)
 } catch {
   case e: Exception =
 logError(Error communicating with MapOutputTracker, e)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [Minor] Fix a typo of type parameter in JavaUtils.scala

2014-12-29 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 815de5400 - 8d72341ab


[Minor] Fix a typo of type parameter in JavaUtils.scala

In JavaUtils.scala, thare is a typo of type parameter. In addition, the type 
information is removed at the time of compile by erasure.

This issue is really minor so I don't  file in JIRA.

Author: Kousuke Saruta saru...@oss.nttdata.co.jp

Closes #3789 from sarutak/fix-typo-in-javautils and squashes the following 
commits:

e20193d [Kousuke Saruta] Fixed a typo of type parameter
82bc5d9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark 
into fix-typo-in-javautils
99f6f63 [Kousuke Saruta] Fixed a typo of type parameter in JavaUtils.scala


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8d72341a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8d72341a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8d72341a

Branch: refs/heads/master
Commit: 8d72341ab75a7fb138b056cfb4e21db42aca55fb
Parents: 815de54
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Authored: Mon Dec 29 12:05:08 2014 -0800
Committer: Reynold Xin r...@databricks.com
Committed: Mon Dec 29 12:05:08 2014 -0800

--
 core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8d72341a/core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala 
b/core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala
index 86e9493..71b2673 100644
--- a/core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala
+++ b/core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala
@@ -80,7 +80,7 @@ private[spark] object JavaUtils {
   prev match {
 case Some(k) =
   underlying match {
-case mm: mutable.Map[a, _] =
+case mm: mutable.Map[A, _] =
   mm remove k
   prev = None
 case _ =


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-4409][MLlib] Additional Linear Algebra Utils

2014-12-29 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master 8d72341ab - 02b55de3d


[SPARK-4409][MLlib] Additional Linear Algebra Utils

Addition of a very limited number of local matrix manipulation and generation 
methods that would be helpful in the further development for algorithms on top 
of BlockMatrix (SPARK-3974), such as Randomized SVD, and Multi Model Training 
(SPARK-1486).
The proposed methods for addition are:

For `Matrix`
 - map: maps the values in the matrix with a given function. Produces a new 
matrix.
 - update: the values in the matrix are updated with a given function. Occurs 
in place.

Factory methods for `DenseMatrix`:
 - *zeros: Generate a matrix consisting of zeros
 - *ones: Generate a matrix consisting of ones
 - *eye: Generate an identity matrix
 - *rand: Generate a matrix consisting of i.i.d. uniform random numbers
 - *randn: Generate a matrix consisting of i.i.d. gaussian random numbers
 - *diag: Generate a diagonal matrix from a supplied vector
*These methods already exist in the factory methods for `Matrices`, however for 
cases where we require a `DenseMatrix`, you constantly have to add 
`.asInstanceOf[DenseMatrix]` everywhere, which makes the code dirtier. I 
propose moving these functions to factory methods for `DenseMatrix` where the 
putput will be a `DenseMatrix` and the factory methods for `Matrices` will call 
these functions directly and output a generic `Matrix`.

Factory methods for `SparseMatrix`:
 - speye: Identity matrix in sparse format. Saves a ton of memory when 
dimensions are large, especially in Multi Model Training, where each row 
requires being multiplied by a scalar.
 - sprand: Generate a sparse matrix with a given density consisting of i.i.d. 
uniform random numbers.
 - sprandn: Generate a sparse matrix with a given density consisting of i.i.d. 
gaussian random numbers.
 - diag: Generate a diagonal matrix from a supplied vector, but is memory 
efficient, because it just stores the diagonal. Again, very helpful in Multi 
Model Training.

Factory methods for `Matrices`:
 - Include all the factory methods given above, but return a generic `Matrix` 
rather than `SparseMatrix` or `DenseMatrix`.
 - horzCat: Horizontally concatenate matrices to form one larger matrix. Very 
useful in both Multi Model Training, and for the repartitioning of BlockMatrix.
 - vertCat: Vertically concatenate matrices to form one larger matrix. Very 
useful for the repartitioning of BlockMatrix.

The names for these methods were selected from MATLAB

Author: Burak Yavuz brk...@gmail.com
Author: Xiangrui Meng m...@databricks.com

Closes #3319 from brkyvz/SPARK-4409 and squashes the following commits:

b0354f6 [Burak Yavuz] [SPARK-4409] Incorporated mengxr's code
04c4829 [Burak Yavuz] Merge pull request #1 from mengxr/SPARK-4409
80cfa29 [Xiangrui Meng] minor changes
ecc937a [Xiangrui Meng] update sprand
4e95e24 [Xiangrui Meng] simplify fromCOO implementation
10a63a6 [Burak Yavuz] [SPARK-4409] Fourth pass of code review
f62d6c7 [Burak Yavuz] [SPARK-4409] Modified genRandMatrix
3971c93 [Burak Yavuz] [SPARK-4409] Third pass of code review
75239f8 [Burak Yavuz] [SPARK-4409] Second pass of code review
e4bd0c0 [Burak Yavuz] [SPARK-4409] Modified horzcat and vertcat
65c562e [Burak Yavuz] [SPARK-4409] Hopefully fixed Java Test
d8be7bc [Burak Yavuz] [SPARK-4409] Organized imports
065b531 [Burak Yavuz] [SPARK-4409] First pass after code review
a8120d2 [Burak Yavuz] [SPARK-4409] Finished updates to API according to 
SPARK-4614
f798c82 [Burak Yavuz] [SPARK-4409] Updated API according to SPARK-4614
c75f3cd [Burak Yavuz] [SPARK-4409] Added JavaAPI Tests, and fixed a couple of 
bugs
d662f9d [Burak Yavuz] [SPARK-4409] Modified according to remote repo
83dfe37 [Burak Yavuz] [SPARK-4409] Scalastyle error fixed
a14c0da [Burak Yavuz] [SPARK-4409] Initial commit to add methods


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/02b55de3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/02b55de3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/02b55de3

Branch: refs/heads/master
Commit: 02b55de3dce9a1fef806be13e5cefa0f39ea2fcc
Parents: 8d72341
Author: Burak Yavuz brk...@gmail.com
Authored: Mon Dec 29 13:24:26 2014 -0800
Committer: Xiangrui Meng m...@databricks.com
Committed: Mon Dec 29 13:24:26 2014 -0800

--
 .../apache/spark/mllib/linalg/Matrices.scala| 570 +--
 .../spark/mllib/linalg/JavaMatricesSuite.java   | 163 ++
 .../spark/mllib/linalg/MatricesSuite.scala  | 172 +-
 .../apache/spark/mllib/util/TestingUtils.scala  |   6 +-
 4 files changed, 868 insertions(+), 43 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/02b55de3/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala

spark git commit: SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions

2014-12-29 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 02b55de3d - 9bc0df680


SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions

takeOrdered should skip reduce step in case mapped RDDs have no partitions. 
This prevents the mentioned exception :

4. run query
SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 
100;
Error trace
java.lang.UnsupportedOperationException: empty collection
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:863)
at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136)

Author: Yash Datta yash.da...@guavus.com

Closes #3830 from saucam/fix_takeorder and squashes the following commits:

5974d10 [Yash Datta] SPARK-4968: takeOrdered to skip reduce step in case 
mappers return no partitions


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9bc0df68
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9bc0df68
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9bc0df68

Branch: refs/heads/master
Commit: 9bc0df6804f241aff24520d9c6ec54d9b11f5785
Parents: 02b55de
Author: Yash Datta yash.da...@guavus.com
Authored: Mon Dec 29 13:49:45 2014 -0800
Committer: Reynold Xin r...@databricks.com
Committed: Mon Dec 29 13:49:45 2014 -0800

--
 core/src/main/scala/org/apache/spark/rdd/RDD.scala | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9bc0df68/core/src/main/scala/org/apache/spark/rdd/RDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rdd/RDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/RDD.scala
index f47c2d1..5118e2b 100644
--- a/core/src/main/scala/org/apache/spark/rdd/RDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/RDD.scala
@@ -1146,15 +1146,20 @@ abstract class RDD[T: ClassTag](
 if (num == 0) {
   Array.empty
 } else {
-  mapPartitions { items =
+  val mapRDDs = mapPartitions { items =
 // Priority keeps the largest elements, so let's reverse the ordering.
 val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
 queue ++= util.collection.Utils.takeOrdered(items, num)(ord)
 Iterator.single(queue)
-  }.reduce { (queue1, queue2) =
-queue1 ++= queue2
-queue1
-  }.toArray.sorted(ord)
+  }
+  if (mapRDDs.partitions.size == 0) {
+Array.empty
+  } else {
+mapRDDs.reduce { (queue1, queue2) =
+  queue1 ++= queue2
+  queue1
+}.toArray.sorted(ord)
+  }
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions

2014-12-29 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-1.2 76046664d - e81c86967


SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions

takeOrdered should skip reduce step in case mapped RDDs have no partitions. 
This prevents the mentioned exception :

4. run query
SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 
100;
Error trace
java.lang.UnsupportedOperationException: empty collection
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:863)
at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136)

Author: Yash Datta yash.da...@guavus.com

Closes #3830 from saucam/fix_takeorder and squashes the following commits:

5974d10 [Yash Datta] SPARK-4968: takeOrdered to skip reduce step in case 
mappers return no partitions

(cherry picked from commit 9bc0df6804f241aff24520d9c6ec54d9b11f5785)
Signed-off-by: Reynold Xin r...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e81c8696
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e81c8696
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e81c8696

Branch: refs/heads/branch-1.2
Commit: e81c869677b566dfcabedca89a40aeea7dc16fa9
Parents: 7604666
Author: Yash Datta yash.da...@guavus.com
Authored: Mon Dec 29 13:49:45 2014 -0800
Committer: Reynold Xin r...@databricks.com
Committed: Mon Dec 29 13:50:34 2014 -0800

--
 core/src/main/scala/org/apache/spark/rdd/RDD.scala | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e81c8696/core/src/main/scala/org/apache/spark/rdd/RDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rdd/RDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/RDD.scala
index ff6d946..c26425d 100644
--- a/core/src/main/scala/org/apache/spark/rdd/RDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/RDD.scala
@@ -1132,15 +1132,20 @@ abstract class RDD[T: ClassTag](
 if (num == 0) {
   Array.empty
 } else {
-  mapPartitions { items =
+  val mapRDDs = mapPartitions { items =
 // Priority keeps the largest elements, so let's reverse the ordering.
 val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
 queue ++= util.collection.Utils.takeOrdered(items, num)(ord)
 Iterator.single(queue)
-  }.reduce { (queue1, queue2) =
-queue1 ++= queue2
-queue1
-  }.toArray.sorted(ord)
+  }
+  if (mapRDDs.partitions.size == 0) {
+Array.empty
+  } else {
+mapRDDs.reduce { (queue1, queue2) =
+  queue1 ++= queue2
+  queue1
+}.toArray.sorted(ord)
+  }
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-29 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master 9bc0df680 - 6cf6fdf3f


SPARK-4156 [MLLIB] EM algorithm for GMMs

Implementation of Expectation-Maximization for Gaussian Mixture Models.

This is my maiden contribution to Apache Spark, so I apologize now if I have 
done anything incorrectly; having said that, this work is my own, and I offer 
it to the project under the project's open source license.

Author: Travis Galoppo tjg2...@columbia.edu
Author: Travis Galoppo travis@localhost.localdomain
Author: tgaloppo tjg2...@columbia.edu
Author: FlytxtRnD meethu.mat...@flytxt.com

Closes #3022 from tgaloppo/master and squashes the following commits:

aaa8f25 [Travis Galoppo] MLUtils: changed privacy of EPSILON from [util] to 
[mllib]
709e4bf [Travis Galoppo] fixed usage line to include optional maxIterations 
parameter
acf1fba [Travis Galoppo] Fixed parameter comment in GaussianMixtureModel Made 
maximum iterations an optional parameter to DenseGmmEM
9b2fc2a [Travis Galoppo] Style improvements Changed ExpectationSum to a private 
class
b97fe00 [Travis Galoppo] Minor fixes and tweaks.
1de73f3 [Travis Galoppo] Removed redundant array from array creation
578c2d1 [Travis Galoppo] Removed unused import
227ad66 [Travis Galoppo] Moved prediction methods into model class.
308c8ad [Travis Galoppo] Numerous changes to improve code
cff73e0 [Travis Galoppo] Replaced accumulators with RDD.aggregate
20ebca1 [Travis Galoppo] Removed unusued code
42b2142 [Travis Galoppo] Added functionality to allow setting of GMM starting 
point. Added two cluster test to testing suite.
8b633f3 [Travis Galoppo] Style issue
9be2534 [Travis Galoppo] Style issue
d695034 [Travis Galoppo] Fixed style issues
c3b8ce0 [Travis Galoppo] Merge branch 'master' of 
https://github.com/tgaloppo/spark   Adds predict() method
2df336b [Travis Galoppo] Fixed style issue
b99ecc4 [tgaloppo] Merge pull request #1 from FlytxtRnD/predictBranch
f407b4c [FlytxtRnD] Added predict() to return the cluster labels and membership 
values
97044cf [Travis Galoppo] Fixed style issues
dc9c742 [Travis Galoppo] Moved MultivariateGaussian utility class
e7d413b [Travis Galoppo] Moved multivariate Gaussian utility class to 
mllib/stat/impl Improved comments
9770261 [Travis Galoppo] Corrected a variety of style and naming issues.
8aaa17d [Travis Galoppo] Added additional train() method to companion object 
for cluster count and tolerance parameters.
676e523 [Travis Galoppo] Fixed to no longer ignore delta value provided on 
command line
e6ea805 [Travis Galoppo] Merged with master branch; update test suite with 
latest context changes. Improved cluster initialization strategy.
86fb382 [Travis Galoppo] Merge remote-tracking branch 'upstream/master'
719d8cc [Travis Galoppo] Added scala test suite with basic test
c1a8e16 [Travis Galoppo] Made GaussianMixtureModel class serializable Modified 
sum function for better performance
5c96c57 [Travis Galoppo] Merge remote-tracking branch 'upstream/master'
c15405c [Travis Galoppo] SPARK-4156


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6cf6fdf3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6cf6fdf3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6cf6fdf3

Branch: refs/heads/master
Commit: 6cf6fdf3ff5d1cf33c2dc28f039adc4d7c0f0464
Parents: 9bc0df6
Author: Travis Galoppo tjg2...@columbia.edu
Authored: Mon Dec 29 15:29:15 2014 -0800
Committer: Xiangrui Meng m...@databricks.com
Committed: Mon Dec 29 15:29:15 2014 -0800

--
 .../spark/examples/mllib/DenseGmmEM.scala   |  67 ++
 .../mllib/clustering/GaussianMixtureEM.scala| 241 +++
 .../mllib/clustering/GaussianMixtureModel.scala |  91 +++
 .../mllib/stat/impl/MultivariateGaussian.scala  |  39 +++
 .../org/apache/spark/mllib/util/MLUtils.scala   |   2 +-
 .../GMMExpectationMaximizationSuite.scala   |  78 ++
 6 files changed, 517 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6cf6fdf3/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala 
b/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala
new file mode 100644
index 000..948c350
--- /dev/null
+++ b/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance 

spark git commit: Added setMinCount to Word2Vec.scala

2014-12-29 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master 6cf6fdf3f - 343db392b


Added setMinCount to Word2Vec.scala

Wanted to customize the private minCount variable in the Word2Vec class. Added
a method to do so.

Author: ganonp gan...@gmail.com

Closes #3693 from ganonp/my-custom-spark and squashes the following commits:

ad534f2 [ganonp] made norm method public
5110a6f [ganonp] Reorganized
854958b [ganonp] Fixed Indentation for setMinCount
12ed8f9 [ganonp] Update Word2Vec.scala
76bdf5a [ganonp] Update Word2Vec.scala
ffb88bb [ganonp] Update Word2Vec.scala
5eb9100 [ganonp] Added setMinCount to Word2Vec.scala


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/343db392
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/343db392
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/343db392

Branch: refs/heads/master
Commit: 343db392b58fb33a3e4bc6fda1da69aaf686b5a9
Parents: 6cf6fdf
Author: ganonp gan...@gmail.com
Authored: Mon Dec 29 15:31:19 2014 -0800
Committer: Xiangrui Meng m...@databricks.com
Committed: Mon Dec 29 15:31:19 2014 -0800

--
 .../org/apache/spark/mllib/feature/Word2Vec.scala| 15 +++
 .../org/apache/spark/mllib/linalg/Vectors.scala  |  2 +-
 2 files changed, 12 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/343db392/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
index 7960f3c..d25a7cd 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
@@ -71,7 +71,8 @@ class Word2Vec extends Serializable with Logging {
   private var numPartitions = 1
   private var numIterations = 1
   private var seed = Utils.random.nextLong()
-
+  private var minCount = 5
+  
   /**
* Sets vector size (default: 100).
*/
@@ -114,6 +115,15 @@ class Word2Vec extends Serializable with Logging {
 this
   }
 
+  /** 
+   * Sets minCount, the minimum number of times a token must appear to be 
included in the word2vec 
+   * model's vocabulary (default: 5).
+   */
+  def setMinCount(minCount: Int): this.type = {
+this.minCount = minCount
+this
+  }
+  
   private val EXP_TABLE_SIZE = 1000
   private val MAX_EXP = 6
   private val MAX_CODE_LENGTH = 40
@@ -122,9 +132,6 @@ class Word2Vec extends Serializable with Logging {
   /** context words from [-window, window] */
   private val window = 5
 
-  /** minimum frequency to consider a vocabulary word */
-  private val minCount = 5
-
   private var trainWordsCount = 0
   private var vocabSize = 0
   private var vocab: Array[VocabWord] = null

http://git-wip-us.apache.org/repos/asf/spark/blob/343db392/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
index 47d1a76..01f3f90 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
@@ -268,7 +268,7 @@ object Vectors {
* @param p norm.
* @return norm in L^p^ space.
*/
-  private[spark] def norm(vector: Vector, p: Double): Double = {
+  def norm(vector: Vector, p: Double): Double = {
 require(p = 1.0)
 val values = vector match {
   case dv: DenseVector = dv.values


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-4972][MLlib] Updated the scala doc for lasso and ridge regression for the change of LeastSquaresGradient

2014-12-29 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master 343db392b - 040d6f2d1


[SPARK-4972][MLlib] Updated the scala doc for lasso and ridge regression for 
the change of LeastSquaresGradient

In #SPARK-4907, we added factor of 2 into the LeastSquaresGradient. We updated 
the scala doc for lasso and ridge regression here.

Author: DB Tsai dbt...@alpinenow.com

Closes #3808 from dbtsai/doc and squashes the following commits:

ec3c989 [DB Tsai] first commit


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/040d6f2d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/040d6f2d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/040d6f2d

Branch: refs/heads/master
Commit: 040d6f2d13b132b3ef2a1e4f12f9f0e781c5a0b8
Parents: 343db39
Author: DB Tsai dbt...@alpinenow.com
Authored: Mon Dec 29 17:17:12 2014 -0800
Committer: Xiangrui Meng m...@databricks.com
Committed: Mon Dec 29 17:17:12 2014 -0800

--
 mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala | 2 +-
 .../scala/org/apache/spark/mllib/regression/RidgeRegression.scala  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/040d6f2d/mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala
index f9791c6..8ecd5c6 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala
@@ -45,7 +45,7 @@ class LassoModel (
 /**
  * Train a regression model with L1-regularization using Stochastic Gradient 
Descent.
  * This solves the l1-regularized least squares regression formulation
- *  f(weights) = 1/n ||A weights-y||^2  + regParam ||weights||_1
+ *  f(weights) = 1/2n ||A weights-y||^2  + regParam ||weights||_1
  * Here the data matrix has n rows, and the input RDD holds the set of rows of 
A, each with
  * its corresponding right hand side label y.
  * See also the documentation for the precise formulation.

http://git-wip-us.apache.org/repos/asf/spark/blob/040d6f2d/mllib/src/main/scala/org/apache/spark/mllib/regression/RidgeRegression.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/regression/RidgeRegression.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/regression/RidgeRegression.scala
index c8cad77..076ba35 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/regression/RidgeRegression.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/regression/RidgeRegression.scala
@@ -45,7 +45,7 @@ class RidgeRegressionModel (
 /**
  * Train a regression model with L2-regularization using Stochastic Gradient 
Descent.
  * This solves the l1-regularized least squares regression formulation
- *  f(weights) = 1/n ||A weights-y||^2  + regParam/2 ||weights||^2
+ *  f(weights) = 1/2n ||A weights-y||^2  + regParam/2 ||weights||^2
  * Here the data matrix has n rows, and the input RDD holds the set of rows of 
A, each with
  * its corresponding right hand side label y.
  * See also the documentation for the precise formulation.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [HOTFIX] Add SPARK_VERSION to Spark package object.

2014-12-29 Thread joshrosen
Repository: spark
Updated Branches:
  refs/heads/branch-1.0 44719e636 - 78157d494


[HOTFIX] Add SPARK_VERSION to Spark package object.

This helps to avoid build breaks when backporting patches that use
org.apache.spark.SPARK_VERSION.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/78157d49
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/78157d49
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/78157d49

Branch: refs/heads/branch-1.0
Commit: 78157d4943def7a4fdb718a52f5a73d5128ca538
Parents: 44719e6
Author: Josh Rosen joshro...@databricks.com
Authored: Mon Dec 29 22:00:13 2014 -0800
Committer: Josh Rosen joshro...@databricks.com
Committed: Mon Dec 29 22:00:13 2014 -0800

--
 core/src/main/scala/org/apache/spark/package.scala | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/78157d49/core/src/main/scala/org/apache/spark/package.scala
--
diff --git a/core/src/main/scala/org/apache/spark/package.scala 
b/core/src/main/scala/org/apache/spark/package.scala
index 5cdbc30..fb3a4c5 100644
--- a/core/src/main/scala/org/apache/spark/package.scala
+++ b/core/src/main/scala/org/apache/spark/package.scala
@@ -44,4 +44,5 @@ package org.apache
 
 package object spark {
   // For package docs only
+  val SPARK_VERSION = 1.0.3-SNAPSHOT
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [HOTFIX] Add SPARK_VERSION to Spark package object.

2014-12-29 Thread joshrosen
Repository: spark
Updated Branches:
  refs/heads/branch-1.1 3442b7bb6 - d5e0a45ed


[HOTFIX] Add SPARK_VERSION to Spark package object.

This helps to avoid build breaks when backporting patches that use
org.apache.spark.SPARK_VERSION.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d5e0a45e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d5e0a45e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d5e0a45e

Branch: refs/heads/branch-1.1
Commit: d5e0a45edb00038f7f2aba3a34bcee1117bd98d9
Parents: 3442b7b
Author: Josh Rosen joshro...@databricks.com
Authored: Mon Dec 29 22:03:07 2014 -0800
Committer: Josh Rosen joshro...@databricks.com
Committed: Mon Dec 29 22:03:07 2014 -0800

--
 core/src/main/scala/org/apache/spark/package.scala | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d5e0a45e/core/src/main/scala/org/apache/spark/package.scala
--
diff --git a/core/src/main/scala/org/apache/spark/package.scala 
b/core/src/main/scala/org/apache/spark/package.scala
index 5cdbc30..8caff7e 100644
--- a/core/src/main/scala/org/apache/spark/package.scala
+++ b/core/src/main/scala/org/apache/spark/package.scala
@@ -44,4 +44,5 @@ package org.apache
 
 package object spark {
   // For package docs only
+  val SPARK_VERSION = 1.1.2-SNAPSHOT
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org