spark git commit: [SPARK-5503][MLLIB] Example code for Power Iteration Clustering

2015-02-13 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master c0ccd2564 - e1a1ff810


[SPARK-5503][MLLIB] Example code for Power Iteration Clustering

Author: sboeschhuawei stephen.boe...@huawei.com

Closes #4495 from javadba/picexamples and squashes the following commits:

3c84b14 [sboeschhuawei] PIC Examples updates from Xiangrui's comments round 5
2878675 [sboeschhuawei] Fourth round with xiangrui on PICExample
d7ac350 [sboeschhuawei] Updates to PICExample from Xiangrui's comments round 3
d7f0cba [sboeschhuawei] Updates to PICExample from Xiangrui's comments round 3
cef28f4 [sboeschhuawei] Further updates to PICExample from Xiangrui's comments
f7ff43d [sboeschhuawei] Update to PICExample from Xiangrui's comments
efeec45 [sboeschhuawei] Update to PICExample from Xiangrui's comments
03e8de4 [sboeschhuawei] Added PICExample
c509130 [sboeschhuawei] placeholder for pic examples
5864d4a [sboeschhuawei] placeholder for pic examples


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e1a1ff81
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e1a1ff81
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e1a1ff81

Branch: refs/heads/master
Commit: e1a1ff8108463ca79299ec0eb555a0c8db9dffa0
Parents: c0ccd25
Author: sboeschhuawei stephen.boe...@huawei.com
Authored: Fri Feb 13 09:45:57 2015 -0800
Committer: Xiangrui Meng m...@databricks.com
Committed: Fri Feb 13 09:45:57 2015 -0800

--
 .../mllib/PowerIterationClusteringExample.scala | 160 +++
 1 file changed, 160 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e1a1ff81/examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala
 
b/examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala
new file mode 100644
index 000..b2373ad
--- /dev/null
+++ 
b/examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.mllib
+
+import org.apache.log4j.{Level, Logger}
+import scopt.OptionParser
+
+import org.apache.spark.mllib.clustering.PowerIterationClustering
+import org.apache.spark.rdd.RDD
+import org.apache.spark.{SparkConf, SparkContext}
+
+/**
+ * An example Power Iteration Clustering 
http://www.icml2010.org/papers/387.pdf app.
+ * Takes an input of K concentric circles and the number of points in the 
innermost circle.
+ * The output should be K clusters - each cluster containing precisely the 
points associated
+ * with each of the input circles.
+ *
+ * Run with
+ * {{{
+ * ./bin/run-example mllib.PowerIterationClusteringExample [options]
+ *
+ * Where options include:
+ *   k:  Number of circles/clusters
+ *   n:  Number of sampled points on innermost circle.. There are 
proportionally more points
+ *  within the outer/larger circles
+ *   maxIterations:   Number of Power Iterations
+ *   outerRadius:  radius of the outermost of the concentric circles
+ * }}}
+ *
+ * Here is a sample run and output:
+ *
+ * ./bin/run-example mllib.PowerIterationClusteringExample
+ * -k 3 --n 30 --maxIterations 15
+ *
+ * Cluster assignments: 1 - [0,1,2,3,4],2 - [5,6,7,8,9,10,11,12,13,14],
+ * 0 - [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]
+ *
+ *
+ * If you use it as a template to create your own app, please use 
`spark-submit` to submit your app.
+ */
+object PowerIterationClusteringExample {
+
+  case class Params(
+  input: String = null,
+  k: Int = 3,
+  numPoints: Int = 5,
+  maxIterations: Int = 10,
+  outerRadius: Double = 3.0
+) extends AbstractParams[Params]
+
+  def main(args: Array[String]) {
+val defaultParams = Params()
+
+val parser = new OptionParser[Params](PIC Circles) {
+  head(PowerIterationClusteringExample: an example PIC app using 
concentric 

spark git commit: [SPARK-5503][MLLIB] Example code for Power Iteration Clustering

2015-02-13 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.3 5c883df09 - 5e6394222


[SPARK-5503][MLLIB] Example code for Power Iteration Clustering

Author: sboeschhuawei stephen.boe...@huawei.com

Closes #4495 from javadba/picexamples and squashes the following commits:

3c84b14 [sboeschhuawei] PIC Examples updates from Xiangrui's comments round 5
2878675 [sboeschhuawei] Fourth round with xiangrui on PICExample
d7ac350 [sboeschhuawei] Updates to PICExample from Xiangrui's comments round 3
d7f0cba [sboeschhuawei] Updates to PICExample from Xiangrui's comments round 3
cef28f4 [sboeschhuawei] Further updates to PICExample from Xiangrui's comments
f7ff43d [sboeschhuawei] Update to PICExample from Xiangrui's comments
efeec45 [sboeschhuawei] Update to PICExample from Xiangrui's comments
03e8de4 [sboeschhuawei] Added PICExample
c509130 [sboeschhuawei] placeholder for pic examples
5864d4a [sboeschhuawei] placeholder for pic examples

(cherry picked from commit e1a1ff8108463ca79299ec0eb555a0c8db9dffa0)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5e639422
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5e639422
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5e639422

Branch: refs/heads/branch-1.3
Commit: 5e639422207a113eee4ea3796c221004664ede1a
Parents: 5c883df
Author: sboeschhuawei stephen.boe...@huawei.com
Authored: Fri Feb 13 09:45:57 2015 -0800
Committer: Xiangrui Meng m...@databricks.com
Committed: Fri Feb 13 09:46:03 2015 -0800

--
 .../mllib/PowerIterationClusteringExample.scala | 160 +++
 1 file changed, 160 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5e639422/examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala
 
b/examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala
new file mode 100644
index 000..b2373ad
--- /dev/null
+++ 
b/examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.mllib
+
+import org.apache.log4j.{Level, Logger}
+import scopt.OptionParser
+
+import org.apache.spark.mllib.clustering.PowerIterationClustering
+import org.apache.spark.rdd.RDD
+import org.apache.spark.{SparkConf, SparkContext}
+
+/**
+ * An example Power Iteration Clustering 
http://www.icml2010.org/papers/387.pdf app.
+ * Takes an input of K concentric circles and the number of points in the 
innermost circle.
+ * The output should be K clusters - each cluster containing precisely the 
points associated
+ * with each of the input circles.
+ *
+ * Run with
+ * {{{
+ * ./bin/run-example mllib.PowerIterationClusteringExample [options]
+ *
+ * Where options include:
+ *   k:  Number of circles/clusters
+ *   n:  Number of sampled points on innermost circle.. There are 
proportionally more points
+ *  within the outer/larger circles
+ *   maxIterations:   Number of Power Iterations
+ *   outerRadius:  radius of the outermost of the concentric circles
+ * }}}
+ *
+ * Here is a sample run and output:
+ *
+ * ./bin/run-example mllib.PowerIterationClusteringExample
+ * -k 3 --n 30 --maxIterations 15
+ *
+ * Cluster assignments: 1 - [0,1,2,3,4],2 - [5,6,7,8,9,10,11,12,13,14],
+ * 0 - [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]
+ *
+ *
+ * If you use it as a template to create your own app, please use 
`spark-submit` to submit your app.
+ */
+object PowerIterationClusteringExample {
+
+  case class Params(
+  input: String = null,
+  k: Int = 3,
+  numPoints: Int = 5,
+  maxIterations: Int = 10,
+  outerRadius: Double = 3.0
+) extends AbstractParams[Params]
+
+  def main(args: Array[String]) {
+val defaultParams = Params()
+
+val