[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-08-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7697


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-08-02 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-127051469
  
@yu-iskw Thanks for adding this, and others for reviewing!  It looks good.  
My only comment is that it might be good to use the built-in MLlib methods like 
MLUtils.loadVectors to load data, rather than having new parsing methods in 
examples. Not a big deal though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-08-02 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-127092526
  
@srowen thank you for merging it!

@jkbradley thank you for your feedback! I agree with that it would be 
better to use `MLUtils.loadVectors`. However,  it doesn't support space 
separated format. So I was wondering if I should keep consistency with the 
input data format of the `spark.mllib KMeans` example or should create new one 
regardless of it. Finally, I thought it would be better to keep consistency 
with the `spark.mllib KMeans` example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-08-02 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-127105448
  
Ohh, I see.  That's fine.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35861720
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import java.util.regex.Pattern;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.ml.clustering.KMeansModel;
+import org.apache.spark.ml.clustering.KMeans;
+import org.apache.spark.mllib.linalg.Vector;
+import org.apache.spark.mllib.linalg.VectorUDT;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.catalyst.expressions.GenericRow;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+
+/**
+ * An example demonstrating a k-means clustering.
+ * Run with
+ * pre
+ * bin/run-example ml.JavaSimpleParamsExample file k
+ * /pre
+ */
+public class JavaKMeansExample {
+
+  private static class ParsePoint implements FunctionString, Row {
+final private static Pattern separator = Pattern.compile( );
--- End diff --

This is picking nits, and something we can fix on merge, but the normal 
order of modifiers is `private static final ...`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-126282672
  
I think this is pretty fine, minus one thing I can fix on merge. Any more 
comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-126133450
  
  [Test build #38919 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38919/consoleFull)
 for   PR 7697 at commit 
[`7137bad`](https://github.com/apache/spark/commit/7137bad68a46bf47a5685708f36b7df72dc68146).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-29 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-126139749
  
@techaddict thank you for your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-126133200
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-126133229
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-126136654
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-29 Thread techaddict
Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35740850
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/KMeansExample.scala ---
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+import org.apache.spark.{SparkContext, SparkConf}
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.ml.clustering.KMeans
+import org.apache.spark.sql.{Row, SQLContext}
+import org.apache.spark.sql.types.{StructField, StructType}
+
+
+/**
+ * An example demonstrating a k-means clustering.
+ * Run with
+ * {{{
+ * bin/run-example ml.KMeansExample file k
+ * }}}
+ */
+object KMeansExample {
+
+  final val FEATURES_COL = features
+
+  def main(args: Array[String]): Unit = {
+if (args.length != 2) {
+  // scalastyle:off println
+  System.err.println(Usage: ml.KMeansExample file k)
+  // scalastyle:of println
+  System.exit(1)
+}
+val input = args(0)
+val k = args(1).toInt
+
+// Creates a Spark context and a SQL context
+val conf = new 
SparkConf().setAppName(s${this.getClass.getSimpleName})
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+import org.apache.spark.mllib.linalg.VectorUDT
--- End diff --

why not import this at the beginning ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-29 Thread techaddict
Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35740783
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/KMeansExample.scala ---
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+import org.apache.spark.{SparkContext, SparkConf}
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.ml.clustering.KMeans
+import org.apache.spark.sql.{Row, SQLContext}
+import org.apache.spark.sql.types.{StructField, StructType}
+
+
+/**
+ * An example demonstrating a k-means clustering.
+ * Run with
+ * {{{
+ * bin/run-example ml.KMeansExample file k
+ * }}}
+ */
+object KMeansExample {
+
+  final val FEATURES_COL = features
+
+  def main(args: Array[String]): Unit = {
+if (args.length != 2) {
+  // scalastyle:off println
+  System.err.println(Usage: ml.KMeansExample file k)
+  // scalastyle:of println
--- End diff --

I think you meant to write `scalastyle:on println`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125866394
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-29 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125851738
  
LGTM pending tests; wouldn't hurt to have @jkbradley look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-29 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125852778
  
@srowen Thank you for reviewing it!
@jkbradley Could you take a glance at this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125866318
  
  [Test build #38807 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38807/console)
 for   PR 7697 at commit 
[`554e574`](https://github.com/apache/spark/commit/554e574646a9ed552cc7d94ac9ece2f8124f8c96).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaKMeansExample `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-126136354
  
  [Test build #38919 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38919/console)
 for   PR 7697 at commit 
[`7137bad`](https://github.com/apache/spark/commit/7137bad68a46bf47a5685708f36b7df72dc68146).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaKMeansExample `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-28 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35619593
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/KMeansExample.scala ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+import org.apache.spark.{SparkContext, SparkConf}
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.ml.clustering.KMeans
+import org.apache.spark.sql.{Row, SQLContext}
+import org.apache.spark.sql.types.{StructField, StructType}
+
+
+/**
+ * An example demonstrating a k-means clustering.
+ * Run with
+ * {{{
+ * bin/run-example ml.KMeansExample file k
+ * }}}
+ */
+object KMeansExample {
+
+  final val FEATURES_COL = features
+
+  def main(args: Array[String]): Unit = {
+if (args.length != 2) {
+  // scalastyle:off println
+  System.err.println(Usage: ml.KMeansExample file k)
+  // scalastyle:of println
+  System.exit(1)
+}
+val input = args(0)
+val k = args(1).toInt
+
+// Creates a Spark context and a SQL context
+val conf = new 
SparkConf().setAppName(s${this.getClass.getSimpleName})
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+import org.apache.spark.mllib.linalg.VectorUDT
+
+// Loads data
+val rowRDD = sc.textFile(input).filter(l = l != )
+  .map(_.split( ).map(v = java.lang.Double.parseDouble(v)))
--- End diff --

`_.toDouble` instead of using `java.lang.Double`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-28 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125478031
  
@srowen I made examples in Scala and Java more simple. Could you review it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-28 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35619640
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/KMeansExample.scala ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+import org.apache.spark.{SparkContext, SparkConf}
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.ml.clustering.KMeans
+import org.apache.spark.sql.{Row, SQLContext}
+import org.apache.spark.sql.types.{StructField, StructType}
+
+
+/**
+ * An example demonstrating a k-means clustering.
+ * Run with
+ * {{{
+ * bin/run-example ml.KMeansExample file k
+ * }}}
+ */
+object KMeansExample {
+
+  final val FEATURES_COL = features
+
+  def main(args: Array[String]): Unit = {
+if (args.length != 2) {
+  // scalastyle:off println
+  System.err.println(Usage: ml.KMeansExample file k)
+  // scalastyle:of println
+  System.exit(1)
+}
+val input = args(0)
+val k = args(1).toInt
+
+// Creates a Spark context and a SQL context
+val conf = new 
SparkConf().setAppName(s${this.getClass.getSimpleName})
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+import org.apache.spark.mllib.linalg.VectorUDT
+
+// Loads data
+val rowRDD = sc.textFile(input).filter(l = l != )
+  .map(_.split( ).map(v = java.lang.Double.parseDouble(v)))
--- End diff --

I think the filter condition can be tightened to `_.nonEmpty`? and likewise 
below can you `map(Vectors.dense)`? I forget whether that syntax will work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-28 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35619518
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import java.util.regex.Pattern;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.ml.clustering.KMeansModel;
+import org.apache.spark.mllib.linalg.Vector;
+import org.apache.spark.mllib.linalg.VectorUDT;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.catalyst.expressions.GenericRow;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+
+/**
+ * An example demonstrating a k-means clustering.
+ * Run with
+ * pre
+ * bin/run-example ml.JavaSimpleParamsExample file k
+ * /pre
+ */
+public class JavaKMeansExample {
+
+  private static class ParsePoint implements FunctionString, Row {
+private static Pattern separater = Pattern.compile( );
--- End diff --

Nit: The spelling should still be `separator` and it can be `final`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-28 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35619529
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import java.util.regex.Pattern;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.ml.clustering.KMeansModel;
+import org.apache.spark.mllib.linalg.Vector;
+import org.apache.spark.mllib.linalg.VectorUDT;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.catalyst.expressions.GenericRow;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+
+/**
+ * An example demonstrating a k-means clustering.
+ * Run with
+ * pre
+ * bin/run-example ml.JavaSimpleParamsExample file k
+ * /pre
+ */
+public class JavaKMeansExample {
+
+  private static class ParsePoint implements FunctionString, Row {
+private static Pattern separater = Pattern.compile( );
+
+@Override
+public Row call(String line) {
+  String[] tok = separater.split(line);
+  double[] point = new double[tok.length];
+  for (int i = 0; i  tok.length; ++i) {
+point[i] = Double.parseDouble(tok[i]);
+  }
+  Vector[] points = {Vectors.dense(point)};
+  Row row = new GenericRow(points);
--- End diff --

While we're here, this can be returned directly, skipping a local var


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-28 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35619542
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import java.util.regex.Pattern;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.ml.clustering.KMeansModel;
+import org.apache.spark.mllib.linalg.Vector;
+import org.apache.spark.mllib.linalg.VectorUDT;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.catalyst.expressions.GenericRow;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+
+/**
+ * An example demonstrating a k-means clustering.
+ * Run with
+ * pre
+ * bin/run-example ml.JavaSimpleParamsExample file k
+ * /pre
+ */
+public class JavaKMeansExample {
+
+  private static class ParsePoint implements FunctionString, Row {
+private static Pattern separater = Pattern.compile( );
+
+@Override
+public Row call(String line) {
+  String[] tok = separater.split(line);
+  double[] point = new double[tok.length];
+  for (int i = 0; i  tok.length; ++i) {
+point[i] = Double.parseDouble(tok[i]);
+  }
+  Vector[] points = {Vectors.dense(point)};
+  Row row = new GenericRow(points);
+  return row;
+}
+  }
+
+  public static void main(String[] args) {
+if (args.length != 2) {
+  System.err.println(Usage: ml.JavaKMeansExample file k);
+  System.exit(1);
+}
+String inputFile = args[0];
+int k = Integer.parseInt(args[1]);
+
+// Parses the arguments
+SparkConf conf = new SparkConf().setAppName(JavaKMeansExample);
+JavaSparkContext jsc = new JavaSparkContext(conf);
+SQLContext sqlContext = new SQLContext(jsc);
+
+// Loads data
+JavaRDDRow points = jsc.textFile(inputFile).map(new ParsePoint());
+StructField[] fields = new StructField[1];
+fields[0] = new StructField(features, new VectorUDT(), false, 
Metadata.empty());
--- End diff --

You can use the same `Foo[] = { ... };` declaration as above here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-28 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35619559
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import java.util.regex.Pattern;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.ml.clustering.KMeansModel;
+import org.apache.spark.mllib.linalg.Vector;
+import org.apache.spark.mllib.linalg.VectorUDT;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.catalyst.expressions.GenericRow;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+
+/**
+ * An example demonstrating a k-means clustering.
+ * Run with
+ * pre
+ * bin/run-example ml.JavaSimpleParamsExample file k
+ * /pre
+ */
+public class JavaKMeansExample {
+
+  private static class ParsePoint implements FunctionString, Row {
+private static Pattern separater = Pattern.compile( );
+
+@Override
+public Row call(String line) {
+  String[] tok = separater.split(line);
+  double[] point = new double[tok.length];
+  for (int i = 0; i  tok.length; ++i) {
+point[i] = Double.parseDouble(tok[i]);
+  }
+  Vector[] points = {Vectors.dense(point)};
+  Row row = new GenericRow(points);
+  return row;
+}
+  }
+
+  public static void main(String[] args) {
+if (args.length != 2) {
+  System.err.println(Usage: ml.JavaKMeansExample file k);
+  System.exit(1);
+}
+String inputFile = args[0];
+int k = Integer.parseInt(args[1]);
+
+// Parses the arguments
+SparkConf conf = new SparkConf().setAppName(JavaKMeansExample);
+JavaSparkContext jsc = new JavaSparkContext(conf);
+SQLContext sqlContext = new SQLContext(jsc);
+
+// Loads data
+JavaRDDRow points = jsc.textFile(inputFile).map(new ParsePoint());
+StructField[] fields = new StructField[1];
+fields[0] = new StructField(features, new VectorUDT(), false, 
Metadata.empty());
+StructType schema = new StructType(fields);
+DataFrame dataset = sqlContext.createDataFrame(points, schema);
+
+// Trains a k-means model
+org.apache.spark.ml.clustering.KMeans kmeans = new 
org.apache.spark.ml.clustering.KMeans()
--- End diff --

Can this be an import or am I missing why it has to be fully qualified here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125849111
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125849122
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125849265
  
  [Test build #38807 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38807/consoleFull)
 for   PR 7697 at commit 
[`554e574`](https://github.com/apache/spark/commit/554e574646a9ed552cc7d94ac9ece2f8124f8c96).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125226470
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125225163
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125227555
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125227612
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35547864
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import java.util.regex.Pattern;
+
+import org.apache.commons.cli.*;
--- End diff --

I think we should not introduce a dependency on Commons CLI just for this. 
It's ancient. Actually, I see one other use of this in an example, which 
shouldn't be there as this is an undeclared dependency. It's not crazy to fix 
that here; it could be a separate PR though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35548056
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import java.util.regex.Pattern;
+
+import org.apache.commons.cli.*;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.ml.clustering.KMeansModel;
+import org.apache.spark.mllib.clustering.KMeans;
+import org.apache.spark.mllib.linalg.Vector;
+import org.apache.spark.mllib.linalg.VectorUDT;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.catalyst.expressions.GenericRow;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+
+/**
+ * An example demonstrating a k-means clustering.
+ * Run with
+ * {{{
+ * bin/run-example ml.JavaSimpleParamsExample [options] -k int -input 
file
+ * }}}
+ */
+public class JavaKMeansExample {
+
+  private static class Params {
+String input;
+Integer k = 2;
+Integer maxIter = 20;
+Integer runs = 1;
+Double epsilon = 1E-4;
+Long seed = 1L;
+String initMode = KMeans.K_MEANS_PARALLEL();
+Integer initSteps = 5;
+  }
+
+  private static class ParsePoint implements FunctionString, Row {
+Pattern separater = Pattern.compile( );
--- End diff --

Nit: separator. This should be private static if possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35548012
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import java.util.regex.Pattern;
+
+import org.apache.commons.cli.*;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.ml.clustering.KMeansModel;
+import org.apache.spark.mllib.clustering.KMeans;
+import org.apache.spark.mllib.linalg.Vector;
+import org.apache.spark.mllib.linalg.VectorUDT;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.catalyst.expressions.GenericRow;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+
+/**
+ * An example demonstrating a k-means clustering.
+ * Run with
+ * {{{
--- End diff --

Unless I'm really out of date, this doesn't work in Javadoc. `{@code ...}` 
works. Though that is not used in Spark consistently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125225198
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125236591
  
  [Test build #38551 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38551/console)
 for   PR 7697 at commit 
[`b09ec13`](https://github.com/apache/spark/commit/b09ec134701d18b0710ff3930fd8e1dd4254ab08).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaKMeansExample `
  * `  case class Params(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125226098
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125228083
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125227789
  
  [Test build #115 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/115/consoleFull)
 for   PR 7697 at commit 
[`b09ec13`](https://github.com/apache/spark/commit/b09ec134701d18b0710ff3930fd8e1dd4254ab08).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125233978
  
  [Test build #115 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/115/console)
 for   PR 7697 at commit 
[`b09ec13`](https://github.com/apache/spark/commit/b09ec134701d18b0710ff3930fd8e1dd4254ab08).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaKMeansExample `
  * `  case class Params(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125234331
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread yu-iskw
GitHub user yu-iskw opened a pull request:

https://github.com/apache/spark/pull/7697

[SPARK-9149][ML][Examples] Add an example of spark.ml KMeans

[SPARK-9149] Add an example of spark.ml KMeans - ASF JIRA 
https://issues.apache.org/jira/browse/SPARK-9149

@jkbradley Should we support other data formats, such as TSV or CSV. I have 
implemented these examples which support only space separated file which is 
same as the example for `spark.mllib`'s `KMeans`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yu-iskw/spark SPARK-9149

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7697.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7697


commit b09ec134701d18b0710ff3930fd8e1dd4254ab08
Author: Yu ISHIKAWA yuu.ishik...@gmail.com
Date:   2015-07-24T08:59:01Z

[SPARK-9149][ML][Examples] Add an example of spark.ml KMeans




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125228153
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125229686
  
  [Test build #38551 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38551/consoleFull)
 for   PR 7697 at commit 
[`b09ec13`](https://github.com/apache/spark/commit/b09ec134701d18b0710ff3930fd8e1dd4254ab08).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125236710
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread yu-iskw
Github user yu-iskw commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35598240
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import java.util.regex.Pattern;
+
+import org.apache.commons.cli.*;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.ml.clustering.KMeansModel;
+import org.apache.spark.mllib.clustering.KMeans;
+import org.apache.spark.mllib.linalg.Vector;
+import org.apache.spark.mllib.linalg.VectorUDT;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.catalyst.expressions.GenericRow;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+
+/**
+ * An example demonstrating a k-means clustering.
+ * Run with
+ * {{{
--- End diff --

I confused it with Scaladocs. I'll fix it. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread yu-iskw
Github user yu-iskw commented on a diff in the pull request:

https://github.com/apache/spark/pull/7697#discussion_r35598247
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import java.util.regex.Pattern;
+
+import org.apache.commons.cli.*;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.ml.clustering.KMeansModel;
+import org.apache.spark.mllib.clustering.KMeans;
+import org.apache.spark.mllib.linalg.Vector;
+import org.apache.spark.mllib.linalg.VectorUDT;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.catalyst.expressions.GenericRow;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+
+/**
+ * An example demonstrating a k-means clustering.
+ * Run with
+ * {{{
+ * bin/run-example ml.JavaSimpleParamsExample [options] -k int -input 
file
+ * }}}
+ */
+public class JavaKMeansExample {
+
+  private static class Params {
+String input;
+Integer k = 2;
+Integer maxIter = 20;
+Integer runs = 1;
+Double epsilon = 1E-4;
+Long seed = 1L;
+String initMode = KMeans.K_MEANS_PARALLEL();
+Integer initSteps = 5;
+  }
+
+  private static class ParsePoint implements FunctionString, Row {
+Pattern separater = Pattern.compile( );
--- End diff --

Alright.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125399245
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125401233
  
  [Test build #38619 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38619/console)
 for   PR 7697 at commit 
[`3e0862d`](https://github.com/apache/spark/commit/3e0862d760f72f0026af34fe5b9e9c2889fcf4fd).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaKMeansExample `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125401272
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125399260
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...

2015-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7697#issuecomment-125399716
  
  [Test build #38619 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38619/consoleFull)
 for   PR 7697 at commit 
[`3e0862d`](https://github.com/apache/spark/commit/3e0862d760f72f0026af34fe5b9e9c2889fcf4fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org