spark git commit: [SPARK-4966][YARN]The MemoryOverhead value is setted not correctly
Repository: spark Updated Branches: refs/heads/branch-1.2 23d64cf08 - 2cd446a90 [SPARK-4966][YARN]The MemoryOverhead value is setted not correctly Author: meiyoula 1039320...@qq.com Closes #3797 from XuTingjun/MemoryOverhead and squashes the following commits: 5a780fc [meiyoula] Update ClientArguments.scala (cherry picked from commit 14fa87bdf4b89cd392270864ee063ce01bd31887) Signed-off-by: Thomas Graves tgra...@apache.org Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2cd446a9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2cd446a9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2cd446a9 Branch: refs/heads/branch-1.2 Commit: 2cd446a90216ac8eb19584c760685fbb470c4301 Parents: 23d64cf Author: meiyoula 1039320...@qq.com Authored: Mon Dec 29 08:20:30 2014 -0600 Committer: Thomas Graves tgra...@apache.org Committed: Mon Dec 29 08:21:19 2014 -0600 -- .../main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2cd446a9/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala -- diff --git a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala index 4d85945..7687a9b 100644 --- a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala +++ b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala @@ -39,6 +39,8 @@ private[spark] class ClientArguments(args: Array[String], sparkConf: SparkConf) var appName: String = Spark var priority = 0 + parseArgs(args.toList) + // Additional memory to allocate to containers // For now, use driver's memory overhead as our AM container's memory overhead val amMemoryOverhead = sparkConf.getInt(spark.yarn.driver.memoryOverhead, @@ -50,7 +52,6 @@ private[spark] class ClientArguments(args: Array[String], sparkConf: SparkConf) private val isDynamicAllocationEnabled = sparkConf.getBoolean(spark.dynamicAllocation.enabled, false) - parseArgs(args.toList) loadEnvironmentArgs() validateArgs() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4982][DOC] `spark.ui.retainedJobs` description is wrong in Spark UI configuration guide
Repository: spark Updated Branches: refs/heads/branch-1.2 2cd446a90 - 76046664d [SPARK-4982][DOC] `spark.ui.retainedJobs` description is wrong in Spark UI configuration guide Author: wangxiaojing u9j...@gmail.com Closes #3818 from wangxiaojing/SPARK-4982 and squashes the following commits: fe2ad5f [wangxiaojing] change stages to jobs (cherry picked from commit 6645e52580747990321e22340ae742f26d2f2504) Signed-off-by: Josh Rosen joshro...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/76046664 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/76046664 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/76046664 Branch: refs/heads/branch-1.2 Commit: 76046664dc9bd830b10c9e4786c211b4407a81e0 Parents: 2cd446a Author: wangxiaojing u9j...@gmail.com Authored: Mon Dec 29 10:45:14 2014 -0800 Committer: Josh Rosen joshro...@databricks.com Committed: Mon Dec 29 10:46:13 2014 -0800 -- docs/configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/76046664/docs/configuration.md -- diff --git a/docs/configuration.md b/docs/configuration.md index 60fde13..d0fbf1a 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -452,7 +452,7 @@ Apart from these, the following properties are also available, and may be useful tdcodespark.ui.retainedJobs/code/td td1000/td td -How many stages the Spark UI and status APIs remember before garbage +How many jobs the Spark UI and status APIs remember before garbage collecting. /td /tr - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem
Repository: spark Updated Branches: refs/heads/master 4cef05e1c - 815de5400 [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem Author: YanTangZhai hakeemz...@tencent.com Author: yantangzhai tyz0...@163.com Closes #3785 from YanTangZhai/SPARK-4946 and squashes the following commits: 9ca6541 [yantangzhai] [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem e4c2c0a [YanTangZhai] Merge pull request #15 from apache/master 718afeb [YanTangZhai] Merge pull request #12 from apache/master 6e643f8 [YanTangZhai] Merge pull request #11 from apache/master e249846 [YanTangZhai] Merge pull request #10 from apache/master d26d982 [YanTangZhai] Merge pull request #9 from apache/master 76d4027 [YanTangZhai] Merge pull request #8 from apache/master 03b62b0 [YanTangZhai] Merge pull request #7 from apache/master 8a00106 [YanTangZhai] Merge pull request #6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/815de540 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/815de540 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/815de540 Branch: refs/heads/master Commit: 815de54002f9c1cfedc398e95896fa207b4a5305 Parents: 4cef05e Author: YanTangZhai hakeemz...@tencent.com Authored: Mon Dec 29 11:30:54 2014 -0800 Committer: Josh Rosen joshro...@databricks.com Committed: Mon Dec 29 11:30:54 2014 -0800 -- core/src/main/scala/org/apache/spark/MapOutputTracker.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/815de540/core/src/main/scala/org/apache/spark/MapOutputTracker.scala -- diff --git a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala index a074ab8..6e4edc7 100644 --- a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala +++ b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala @@ -76,6 +76,8 @@ private[spark] class MapOutputTrackerMasterActor(tracker: MapOutputTrackerMaster */ private[spark] abstract class MapOutputTracker(conf: SparkConf) extends Logging { private val timeout = AkkaUtils.askTimeout(conf) + private val retryAttempts = AkkaUtils.numRetries(conf) + private val retryIntervalMs = AkkaUtils.retryWaitMs(conf) /** Set to the MapOutputTrackerActor living on the driver. */ var trackerActor: ActorRef = _ @@ -108,8 +110,7 @@ private[spark] abstract class MapOutputTracker(conf: SparkConf) extends Logging */ protected def askTracker(message: Any): Any = { try { - val future = trackerActor.ask(message)(timeout) - Await.result(future, timeout) + AkkaUtils.askWithReply(message, trackerActor, retryAttempts, retryIntervalMs, timeout) } catch { case e: Exception = logError(Error communicating with MapOutputTracker, e) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [Minor] Fix a typo of type parameter in JavaUtils.scala
Repository: spark Updated Branches: refs/heads/master 815de5400 - 8d72341ab [Minor] Fix a typo of type parameter in JavaUtils.scala In JavaUtils.scala, thare is a typo of type parameter. In addition, the type information is removed at the time of compile by erasure. This issue is really minor so I don't file in JIRA. Author: Kousuke Saruta saru...@oss.nttdata.co.jp Closes #3789 from sarutak/fix-typo-in-javautils and squashes the following commits: e20193d [Kousuke Saruta] Fixed a typo of type parameter 82bc5d9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-typo-in-javautils 99f6f63 [Kousuke Saruta] Fixed a typo of type parameter in JavaUtils.scala Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8d72341a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8d72341a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8d72341a Branch: refs/heads/master Commit: 8d72341ab75a7fb138b056cfb4e21db42aca55fb Parents: 815de54 Author: Kousuke Saruta saru...@oss.nttdata.co.jp Authored: Mon Dec 29 12:05:08 2014 -0800 Committer: Reynold Xin r...@databricks.com Committed: Mon Dec 29 12:05:08 2014 -0800 -- core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8d72341a/core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala -- diff --git a/core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala b/core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala index 86e9493..71b2673 100644 --- a/core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala +++ b/core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala @@ -80,7 +80,7 @@ private[spark] object JavaUtils { prev match { case Some(k) = underlying match { -case mm: mutable.Map[a, _] = +case mm: mutable.Map[A, _] = mm remove k prev = None case _ = - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4409][MLlib] Additional Linear Algebra Utils
Repository: spark Updated Branches: refs/heads/master 8d72341ab - 02b55de3d [SPARK-4409][MLlib] Additional Linear Algebra Utils Addition of a very limited number of local matrix manipulation and generation methods that would be helpful in the further development for algorithms on top of BlockMatrix (SPARK-3974), such as Randomized SVD, and Multi Model Training (SPARK-1486). The proposed methods for addition are: For `Matrix` - map: maps the values in the matrix with a given function. Produces a new matrix. - update: the values in the matrix are updated with a given function. Occurs in place. Factory methods for `DenseMatrix`: - *zeros: Generate a matrix consisting of zeros - *ones: Generate a matrix consisting of ones - *eye: Generate an identity matrix - *rand: Generate a matrix consisting of i.i.d. uniform random numbers - *randn: Generate a matrix consisting of i.i.d. gaussian random numbers - *diag: Generate a diagonal matrix from a supplied vector *These methods already exist in the factory methods for `Matrices`, however for cases where we require a `DenseMatrix`, you constantly have to add `.asInstanceOf[DenseMatrix]` everywhere, which makes the code dirtier. I propose moving these functions to factory methods for `DenseMatrix` where the putput will be a `DenseMatrix` and the factory methods for `Matrices` will call these functions directly and output a generic `Matrix`. Factory methods for `SparseMatrix`: - speye: Identity matrix in sparse format. Saves a ton of memory when dimensions are large, especially in Multi Model Training, where each row requires being multiplied by a scalar. - sprand: Generate a sparse matrix with a given density consisting of i.i.d. uniform random numbers. - sprandn: Generate a sparse matrix with a given density consisting of i.i.d. gaussian random numbers. - diag: Generate a diagonal matrix from a supplied vector, but is memory efficient, because it just stores the diagonal. Again, very helpful in Multi Model Training. Factory methods for `Matrices`: - Include all the factory methods given above, but return a generic `Matrix` rather than `SparseMatrix` or `DenseMatrix`. - horzCat: Horizontally concatenate matrices to form one larger matrix. Very useful in both Multi Model Training, and for the repartitioning of BlockMatrix. - vertCat: Vertically concatenate matrices to form one larger matrix. Very useful for the repartitioning of BlockMatrix. The names for these methods were selected from MATLAB Author: Burak Yavuz brk...@gmail.com Author: Xiangrui Meng m...@databricks.com Closes #3319 from brkyvz/SPARK-4409 and squashes the following commits: b0354f6 [Burak Yavuz] [SPARK-4409] Incorporated mengxr's code 04c4829 [Burak Yavuz] Merge pull request #1 from mengxr/SPARK-4409 80cfa29 [Xiangrui Meng] minor changes ecc937a [Xiangrui Meng] update sprand 4e95e24 [Xiangrui Meng] simplify fromCOO implementation 10a63a6 [Burak Yavuz] [SPARK-4409] Fourth pass of code review f62d6c7 [Burak Yavuz] [SPARK-4409] Modified genRandMatrix 3971c93 [Burak Yavuz] [SPARK-4409] Third pass of code review 75239f8 [Burak Yavuz] [SPARK-4409] Second pass of code review e4bd0c0 [Burak Yavuz] [SPARK-4409] Modified horzcat and vertcat 65c562e [Burak Yavuz] [SPARK-4409] Hopefully fixed Java Test d8be7bc [Burak Yavuz] [SPARK-4409] Organized imports 065b531 [Burak Yavuz] [SPARK-4409] First pass after code review a8120d2 [Burak Yavuz] [SPARK-4409] Finished updates to API according to SPARK-4614 f798c82 [Burak Yavuz] [SPARK-4409] Updated API according to SPARK-4614 c75f3cd [Burak Yavuz] [SPARK-4409] Added JavaAPI Tests, and fixed a couple of bugs d662f9d [Burak Yavuz] [SPARK-4409] Modified according to remote repo 83dfe37 [Burak Yavuz] [SPARK-4409] Scalastyle error fixed a14c0da [Burak Yavuz] [SPARK-4409] Initial commit to add methods Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/02b55de3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/02b55de3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/02b55de3 Branch: refs/heads/master Commit: 02b55de3dce9a1fef806be13e5cefa0f39ea2fcc Parents: 8d72341 Author: Burak Yavuz brk...@gmail.com Authored: Mon Dec 29 13:24:26 2014 -0800 Committer: Xiangrui Meng m...@databricks.com Committed: Mon Dec 29 13:24:26 2014 -0800 -- .../apache/spark/mllib/linalg/Matrices.scala| 570 +-- .../spark/mllib/linalg/JavaMatricesSuite.java | 163 ++ .../spark/mllib/linalg/MatricesSuite.scala | 172 +- .../apache/spark/mllib/util/TestingUtils.scala | 6 +- 4 files changed, 868 insertions(+), 43 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/02b55de3/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
spark git commit: SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions
Repository: spark Updated Branches: refs/heads/master 02b55de3d - 9bc0df680 SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions takeOrdered should skip reduce step in case mapped RDDs have no partitions. This prevents the mentioned exception : 4. run query SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100; Error trace java.lang.UnsupportedOperationException: empty collection at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.reduce(RDD.scala:863) at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136) Author: Yash Datta yash.da...@guavus.com Closes #3830 from saucam/fix_takeorder and squashes the following commits: 5974d10 [Yash Datta] SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9bc0df68 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9bc0df68 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9bc0df68 Branch: refs/heads/master Commit: 9bc0df6804f241aff24520d9c6ec54d9b11f5785 Parents: 02b55de Author: Yash Datta yash.da...@guavus.com Authored: Mon Dec 29 13:49:45 2014 -0800 Committer: Reynold Xin r...@databricks.com Committed: Mon Dec 29 13:49:45 2014 -0800 -- core/src/main/scala/org/apache/spark/rdd/RDD.scala | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9bc0df68/core/src/main/scala/org/apache/spark/rdd/RDD.scala -- diff --git a/core/src/main/scala/org/apache/spark/rdd/RDD.scala b/core/src/main/scala/org/apache/spark/rdd/RDD.scala index f47c2d1..5118e2b 100644 --- a/core/src/main/scala/org/apache/spark/rdd/RDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/RDD.scala @@ -1146,15 +1146,20 @@ abstract class RDD[T: ClassTag]( if (num == 0) { Array.empty } else { - mapPartitions { items = + val mapRDDs = mapPartitions { items = // Priority keeps the largest elements, so let's reverse the ordering. val queue = new BoundedPriorityQueue[T](num)(ord.reverse) queue ++= util.collection.Utils.takeOrdered(items, num)(ord) Iterator.single(queue) - }.reduce { (queue1, queue2) = -queue1 ++= queue2 -queue1 - }.toArray.sorted(ord) + } + if (mapRDDs.partitions.size == 0) { +Array.empty + } else { +mapRDDs.reduce { (queue1, queue2) = + queue1 ++= queue2 + queue1 +}.toArray.sorted(ord) + } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions
Repository: spark Updated Branches: refs/heads/branch-1.2 76046664d - e81c86967 SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions takeOrdered should skip reduce step in case mapped RDDs have no partitions. This prevents the mentioned exception : 4. run query SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100; Error trace java.lang.UnsupportedOperationException: empty collection at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.reduce(RDD.scala:863) at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136) Author: Yash Datta yash.da...@guavus.com Closes #3830 from saucam/fix_takeorder and squashes the following commits: 5974d10 [Yash Datta] SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions (cherry picked from commit 9bc0df6804f241aff24520d9c6ec54d9b11f5785) Signed-off-by: Reynold Xin r...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e81c8696 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e81c8696 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e81c8696 Branch: refs/heads/branch-1.2 Commit: e81c869677b566dfcabedca89a40aeea7dc16fa9 Parents: 7604666 Author: Yash Datta yash.da...@guavus.com Authored: Mon Dec 29 13:49:45 2014 -0800 Committer: Reynold Xin r...@databricks.com Committed: Mon Dec 29 13:50:34 2014 -0800 -- core/src/main/scala/org/apache/spark/rdd/RDD.scala | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e81c8696/core/src/main/scala/org/apache/spark/rdd/RDD.scala -- diff --git a/core/src/main/scala/org/apache/spark/rdd/RDD.scala b/core/src/main/scala/org/apache/spark/rdd/RDD.scala index ff6d946..c26425d 100644 --- a/core/src/main/scala/org/apache/spark/rdd/RDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/RDD.scala @@ -1132,15 +1132,20 @@ abstract class RDD[T: ClassTag]( if (num == 0) { Array.empty } else { - mapPartitions { items = + val mapRDDs = mapPartitions { items = // Priority keeps the largest elements, so let's reverse the ordering. val queue = new BoundedPriorityQueue[T](num)(ord.reverse) queue ++= util.collection.Utils.takeOrdered(items, num)(ord) Iterator.single(queue) - }.reduce { (queue1, queue2) = -queue1 ++= queue2 -queue1 - }.toArray.sorted(ord) + } + if (mapRDDs.partitions.size == 0) { +Array.empty + } else { +mapRDDs.reduce { (queue1, queue2) = + queue1 ++= queue2 + queue1 +}.toArray.sorted(ord) + } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: SPARK-4156 [MLLIB] EM algorithm for GMMs
Repository: spark Updated Branches: refs/heads/master 9bc0df680 - 6cf6fdf3f SPARK-4156 [MLLIB] EM algorithm for GMMs Implementation of Expectation-Maximization for Gaussian Mixture Models. This is my maiden contribution to Apache Spark, so I apologize now if I have done anything incorrectly; having said that, this work is my own, and I offer it to the project under the project's open source license. Author: Travis Galoppo tjg2...@columbia.edu Author: Travis Galoppo travis@localhost.localdomain Author: tgaloppo tjg2...@columbia.edu Author: FlytxtRnD meethu.mat...@flytxt.com Closes #3022 from tgaloppo/master and squashes the following commits: aaa8f25 [Travis Galoppo] MLUtils: changed privacy of EPSILON from [util] to [mllib] 709e4bf [Travis Galoppo] fixed usage line to include optional maxIterations parameter acf1fba [Travis Galoppo] Fixed parameter comment in GaussianMixtureModel Made maximum iterations an optional parameter to DenseGmmEM 9b2fc2a [Travis Galoppo] Style improvements Changed ExpectationSum to a private class b97fe00 [Travis Galoppo] Minor fixes and tweaks. 1de73f3 [Travis Galoppo] Removed redundant array from array creation 578c2d1 [Travis Galoppo] Removed unused import 227ad66 [Travis Galoppo] Moved prediction methods into model class. 308c8ad [Travis Galoppo] Numerous changes to improve code cff73e0 [Travis Galoppo] Replaced accumulators with RDD.aggregate 20ebca1 [Travis Galoppo] Removed unusued code 42b2142 [Travis Galoppo] Added functionality to allow setting of GMM starting point. Added two cluster test to testing suite. 8b633f3 [Travis Galoppo] Style issue 9be2534 [Travis Galoppo] Style issue d695034 [Travis Galoppo] Fixed style issues c3b8ce0 [Travis Galoppo] Merge branch 'master' of https://github.com/tgaloppo/spark Adds predict() method 2df336b [Travis Galoppo] Fixed style issue b99ecc4 [tgaloppo] Merge pull request #1 from FlytxtRnD/predictBranch f407b4c [FlytxtRnD] Added predict() to return the cluster labels and membership values 97044cf [Travis Galoppo] Fixed style issues dc9c742 [Travis Galoppo] Moved MultivariateGaussian utility class e7d413b [Travis Galoppo] Moved multivariate Gaussian utility class to mllib/stat/impl Improved comments 9770261 [Travis Galoppo] Corrected a variety of style and naming issues. 8aaa17d [Travis Galoppo] Added additional train() method to companion object for cluster count and tolerance parameters. 676e523 [Travis Galoppo] Fixed to no longer ignore delta value provided on command line e6ea805 [Travis Galoppo] Merged with master branch; update test suite with latest context changes. Improved cluster initialization strategy. 86fb382 [Travis Galoppo] Merge remote-tracking branch 'upstream/master' 719d8cc [Travis Galoppo] Added scala test suite with basic test c1a8e16 [Travis Galoppo] Made GaussianMixtureModel class serializable Modified sum function for better performance 5c96c57 [Travis Galoppo] Merge remote-tracking branch 'upstream/master' c15405c [Travis Galoppo] SPARK-4156 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6cf6fdf3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6cf6fdf3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6cf6fdf3 Branch: refs/heads/master Commit: 6cf6fdf3ff5d1cf33c2dc28f039adc4d7c0f0464 Parents: 9bc0df6 Author: Travis Galoppo tjg2...@columbia.edu Authored: Mon Dec 29 15:29:15 2014 -0800 Committer: Xiangrui Meng m...@databricks.com Committed: Mon Dec 29 15:29:15 2014 -0800 -- .../spark/examples/mllib/DenseGmmEM.scala | 67 ++ .../mllib/clustering/GaussianMixtureEM.scala| 241 +++ .../mllib/clustering/GaussianMixtureModel.scala | 91 +++ .../mllib/stat/impl/MultivariateGaussian.scala | 39 +++ .../org/apache/spark/mllib/util/MLUtils.scala | 2 +- .../GMMExpectationMaximizationSuite.scala | 78 ++ 6 files changed, 517 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6cf6fdf3/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala -- diff --git a/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala b/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala new file mode 100644 index 000..948c350 --- /dev/null +++ b/examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance
spark git commit: Added setMinCount to Word2Vec.scala
Repository: spark Updated Branches: refs/heads/master 6cf6fdf3f - 343db392b Added setMinCount to Word2Vec.scala Wanted to customize the private minCount variable in the Word2Vec class. Added a method to do so. Author: ganonp gan...@gmail.com Closes #3693 from ganonp/my-custom-spark and squashes the following commits: ad534f2 [ganonp] made norm method public 5110a6f [ganonp] Reorganized 854958b [ganonp] Fixed Indentation for setMinCount 12ed8f9 [ganonp] Update Word2Vec.scala 76bdf5a [ganonp] Update Word2Vec.scala ffb88bb [ganonp] Update Word2Vec.scala 5eb9100 [ganonp] Added setMinCount to Word2Vec.scala Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/343db392 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/343db392 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/343db392 Branch: refs/heads/master Commit: 343db392b58fb33a3e4bc6fda1da69aaf686b5a9 Parents: 6cf6fdf Author: ganonp gan...@gmail.com Authored: Mon Dec 29 15:31:19 2014 -0800 Committer: Xiangrui Meng m...@databricks.com Committed: Mon Dec 29 15:31:19 2014 -0800 -- .../org/apache/spark/mllib/feature/Word2Vec.scala| 15 +++ .../org/apache/spark/mllib/linalg/Vectors.scala | 2 +- 2 files changed, 12 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/343db392/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala index 7960f3c..d25a7cd 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala @@ -71,7 +71,8 @@ class Word2Vec extends Serializable with Logging { private var numPartitions = 1 private var numIterations = 1 private var seed = Utils.random.nextLong() - + private var minCount = 5 + /** * Sets vector size (default: 100). */ @@ -114,6 +115,15 @@ class Word2Vec extends Serializable with Logging { this } + /** + * Sets minCount, the minimum number of times a token must appear to be included in the word2vec + * model's vocabulary (default: 5). + */ + def setMinCount(minCount: Int): this.type = { +this.minCount = minCount +this + } + private val EXP_TABLE_SIZE = 1000 private val MAX_EXP = 6 private val MAX_CODE_LENGTH = 40 @@ -122,9 +132,6 @@ class Word2Vec extends Serializable with Logging { /** context words from [-window, window] */ private val window = 5 - /** minimum frequency to consider a vocabulary word */ - private val minCount = 5 - private var trainWordsCount = 0 private var vocabSize = 0 private var vocab: Array[VocabWord] = null http://git-wip-us.apache.org/repos/asf/spark/blob/343db392/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala index 47d1a76..01f3f90 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala @@ -268,7 +268,7 @@ object Vectors { * @param p norm. * @return norm in L^p^ space. */ - private[spark] def norm(vector: Vector, p: Double): Double = { + def norm(vector: Vector, p: Double): Double = { require(p = 1.0) val values = vector match { case dv: DenseVector = dv.values - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4972][MLlib] Updated the scala doc for lasso and ridge regression for the change of LeastSquaresGradient
Repository: spark Updated Branches: refs/heads/master 343db392b - 040d6f2d1 [SPARK-4972][MLlib] Updated the scala doc for lasso and ridge regression for the change of LeastSquaresGradient In #SPARK-4907, we added factor of 2 into the LeastSquaresGradient. We updated the scala doc for lasso and ridge regression here. Author: DB Tsai dbt...@alpinenow.com Closes #3808 from dbtsai/doc and squashes the following commits: ec3c989 [DB Tsai] first commit Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/040d6f2d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/040d6f2d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/040d6f2d Branch: refs/heads/master Commit: 040d6f2d13b132b3ef2a1e4f12f9f0e781c5a0b8 Parents: 343db39 Author: DB Tsai dbt...@alpinenow.com Authored: Mon Dec 29 17:17:12 2014 -0800 Committer: Xiangrui Meng m...@databricks.com Committed: Mon Dec 29 17:17:12 2014 -0800 -- mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala | 2 +- .../scala/org/apache/spark/mllib/regression/RidgeRegression.scala | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/040d6f2d/mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala b/mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala index f9791c6..8ecd5c6 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala @@ -45,7 +45,7 @@ class LassoModel ( /** * Train a regression model with L1-regularization using Stochastic Gradient Descent. * This solves the l1-regularized least squares regression formulation - * f(weights) = 1/n ||A weights-y||^2 + regParam ||weights||_1 + * f(weights) = 1/2n ||A weights-y||^2 + regParam ||weights||_1 * Here the data matrix has n rows, and the input RDD holds the set of rows of A, each with * its corresponding right hand side label y. * See also the documentation for the precise formulation. http://git-wip-us.apache.org/repos/asf/spark/blob/040d6f2d/mllib/src/main/scala/org/apache/spark/mllib/regression/RidgeRegression.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/regression/RidgeRegression.scala b/mllib/src/main/scala/org/apache/spark/mllib/regression/RidgeRegression.scala index c8cad77..076ba35 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/regression/RidgeRegression.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/regression/RidgeRegression.scala @@ -45,7 +45,7 @@ class RidgeRegressionModel ( /** * Train a regression model with L2-regularization using Stochastic Gradient Descent. * This solves the l1-regularized least squares regression formulation - * f(weights) = 1/n ||A weights-y||^2 + regParam/2 ||weights||^2 + * f(weights) = 1/2n ||A weights-y||^2 + regParam/2 ||weights||^2 * Here the data matrix has n rows, and the input RDD holds the set of rows of A, each with * its corresponding right hand side label y. * See also the documentation for the precise formulation. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [HOTFIX] Add SPARK_VERSION to Spark package object.
Repository: spark Updated Branches: refs/heads/branch-1.0 44719e636 - 78157d494 [HOTFIX] Add SPARK_VERSION to Spark package object. This helps to avoid build breaks when backporting patches that use org.apache.spark.SPARK_VERSION. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/78157d49 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/78157d49 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/78157d49 Branch: refs/heads/branch-1.0 Commit: 78157d4943def7a4fdb718a52f5a73d5128ca538 Parents: 44719e6 Author: Josh Rosen joshro...@databricks.com Authored: Mon Dec 29 22:00:13 2014 -0800 Committer: Josh Rosen joshro...@databricks.com Committed: Mon Dec 29 22:00:13 2014 -0800 -- core/src/main/scala/org/apache/spark/package.scala | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/78157d49/core/src/main/scala/org/apache/spark/package.scala -- diff --git a/core/src/main/scala/org/apache/spark/package.scala b/core/src/main/scala/org/apache/spark/package.scala index 5cdbc30..fb3a4c5 100644 --- a/core/src/main/scala/org/apache/spark/package.scala +++ b/core/src/main/scala/org/apache/spark/package.scala @@ -44,4 +44,5 @@ package org.apache package object spark { // For package docs only + val SPARK_VERSION = 1.0.3-SNAPSHOT } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [HOTFIX] Add SPARK_VERSION to Spark package object.
Repository: spark Updated Branches: refs/heads/branch-1.1 3442b7bb6 - d5e0a45ed [HOTFIX] Add SPARK_VERSION to Spark package object. This helps to avoid build breaks when backporting patches that use org.apache.spark.SPARK_VERSION. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d5e0a45e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d5e0a45e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d5e0a45e Branch: refs/heads/branch-1.1 Commit: d5e0a45edb00038f7f2aba3a34bcee1117bd98d9 Parents: 3442b7b Author: Josh Rosen joshro...@databricks.com Authored: Mon Dec 29 22:03:07 2014 -0800 Committer: Josh Rosen joshro...@databricks.com Committed: Mon Dec 29 22:03:07 2014 -0800 -- core/src/main/scala/org/apache/spark/package.scala | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d5e0a45e/core/src/main/scala/org/apache/spark/package.scala -- diff --git a/core/src/main/scala/org/apache/spark/package.scala b/core/src/main/scala/org/apache/spark/package.scala index 5cdbc30..8caff7e 100644 --- a/core/src/main/scala/org/apache/spark/package.scala +++ b/core/src/main/scala/org/apache/spark/package.scala @@ -44,4 +44,5 @@ package org.apache package object spark { // For package docs only + val SPARK_VERSION = 1.1.2-SNAPSHOT } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org