[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/7697 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-127051469 @yu-iskw Thanks for adding this, and others for reviewing! It looks good. My only comment is that it might be good to use the built-in MLlib methods like MLUtils.loadVectors to load data, rather than having new parsing methods in examples. Not a big deal though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-127092526 @srowen thank you for merging it! @jkbradley thank you for your feedback! I agree with that it would be better to use `MLUtils.loadVectors`. However, it doesn't support space separated format. So I was wondering if I should keep consistency with the input data format of the `spark.mllib KMeans` example or should create new one regardless of it. Finally, I thought it would be better to keep consistency with the `spark.mllib KMeans` example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-127105448 Ohh, I see. That's fine. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35861720 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml; + +import java.util.regex.Pattern; + +import org.apache.spark.SparkConf; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.api.java.function.Function; +import org.apache.spark.ml.clustering.KMeansModel; +import org.apache.spark.ml.clustering.KMeans; +import org.apache.spark.mllib.linalg.Vector; +import org.apache.spark.mllib.linalg.VectorUDT; +import org.apache.spark.mllib.linalg.Vectors; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.catalyst.expressions.GenericRow; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +/** + * An example demonstrating a k-means clustering. + * Run with + * pre + * bin/run-example ml.JavaSimpleParamsExample file k + * /pre + */ +public class JavaKMeansExample { + + private static class ParsePoint implements FunctionString, Row { +final private static Pattern separator = Pattern.compile( ); --- End diff -- This is picking nits, and something we can fix on merge, but the normal order of modifiers is `private static final ...` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-126282672 I think this is pretty fine, minus one thing I can fix on merge. Any more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-126133450 [Test build #38919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38919/consoleFull) for PR 7697 at commit [`7137bad`](https://github.com/apache/spark/commit/7137bad68a46bf47a5685708f36b7df72dc68146). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-126139749 @techaddict thank you for your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-126133200 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-126133229 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-126136654 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35740850 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/KMeansExample.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml + +import org.apache.spark.{SparkContext, SparkConf} +import org.apache.spark.mllib.linalg.Vectors +import org.apache.spark.ml.clustering.KMeans +import org.apache.spark.sql.{Row, SQLContext} +import org.apache.spark.sql.types.{StructField, StructType} + + +/** + * An example demonstrating a k-means clustering. + * Run with + * {{{ + * bin/run-example ml.KMeansExample file k + * }}} + */ +object KMeansExample { + + final val FEATURES_COL = features + + def main(args: Array[String]): Unit = { +if (args.length != 2) { + // scalastyle:off println + System.err.println(Usage: ml.KMeansExample file k) + // scalastyle:of println + System.exit(1) +} +val input = args(0) +val k = args(1).toInt + +// Creates a Spark context and a SQL context +val conf = new SparkConf().setAppName(s${this.getClass.getSimpleName}) +val sc = new SparkContext(conf) +val sqlContext = new SQLContext(sc) +import org.apache.spark.mllib.linalg.VectorUDT --- End diff -- why not import this at the beginning ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35740783 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/KMeansExample.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml + +import org.apache.spark.{SparkContext, SparkConf} +import org.apache.spark.mllib.linalg.Vectors +import org.apache.spark.ml.clustering.KMeans +import org.apache.spark.sql.{Row, SQLContext} +import org.apache.spark.sql.types.{StructField, StructType} + + +/** + * An example demonstrating a k-means clustering. + * Run with + * {{{ + * bin/run-example ml.KMeansExample file k + * }}} + */ +object KMeansExample { + + final val FEATURES_COL = features + + def main(args: Array[String]): Unit = { +if (args.length != 2) { + // scalastyle:off println + System.err.println(Usage: ml.KMeansExample file k) + // scalastyle:of println --- End diff -- I think you meant to write `scalastyle:on println`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125866394 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125851738 LGTM pending tests; wouldn't hurt to have @jkbradley look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125852778 @srowen Thank you for reviewing it! @jkbradley Could you take a glance at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125866318 [Test build #38807 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38807/console) for PR 7697 at commit [`554e574`](https://github.com/apache/spark/commit/554e574646a9ed552cc7d94ac9ece2f8124f8c96). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaKMeansExample ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-126136354 [Test build #38919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38919/console) for PR 7697 at commit [`7137bad`](https://github.com/apache/spark/commit/7137bad68a46bf47a5685708f36b7df72dc68146). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaKMeansExample ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35619593 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/KMeansExample.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml + +import org.apache.spark.{SparkContext, SparkConf} +import org.apache.spark.mllib.linalg.Vectors +import org.apache.spark.ml.clustering.KMeans +import org.apache.spark.sql.{Row, SQLContext} +import org.apache.spark.sql.types.{StructField, StructType} + + +/** + * An example demonstrating a k-means clustering. + * Run with + * {{{ + * bin/run-example ml.KMeansExample file k + * }}} + */ +object KMeansExample { + + final val FEATURES_COL = features + + def main(args: Array[String]): Unit = { +if (args.length != 2) { + // scalastyle:off println + System.err.println(Usage: ml.KMeansExample file k) + // scalastyle:of println + System.exit(1) +} +val input = args(0) +val k = args(1).toInt + +// Creates a Spark context and a SQL context +val conf = new SparkConf().setAppName(s${this.getClass.getSimpleName}) +val sc = new SparkContext(conf) +val sqlContext = new SQLContext(sc) +import org.apache.spark.mllib.linalg.VectorUDT + +// Loads data +val rowRDD = sc.textFile(input).filter(l = l != ) + .map(_.split( ).map(v = java.lang.Double.parseDouble(v))) --- End diff -- `_.toDouble` instead of using `java.lang.Double`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125478031 @srowen I made examples in Scala and Java more simple. Could you review it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35619640 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/KMeansExample.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml + +import org.apache.spark.{SparkContext, SparkConf} +import org.apache.spark.mllib.linalg.Vectors +import org.apache.spark.ml.clustering.KMeans +import org.apache.spark.sql.{Row, SQLContext} +import org.apache.spark.sql.types.{StructField, StructType} + + +/** + * An example demonstrating a k-means clustering. + * Run with + * {{{ + * bin/run-example ml.KMeansExample file k + * }}} + */ +object KMeansExample { + + final val FEATURES_COL = features + + def main(args: Array[String]): Unit = { +if (args.length != 2) { + // scalastyle:off println + System.err.println(Usage: ml.KMeansExample file k) + // scalastyle:of println + System.exit(1) +} +val input = args(0) +val k = args(1).toInt + +// Creates a Spark context and a SQL context +val conf = new SparkConf().setAppName(s${this.getClass.getSimpleName}) +val sc = new SparkContext(conf) +val sqlContext = new SQLContext(sc) +import org.apache.spark.mllib.linalg.VectorUDT + +// Loads data +val rowRDD = sc.textFile(input).filter(l = l != ) + .map(_.split( ).map(v = java.lang.Double.parseDouble(v))) --- End diff -- I think the filter condition can be tightened to `_.nonEmpty`? and likewise below can you `map(Vectors.dense)`? I forget whether that syntax will work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35619518 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml; + +import java.util.regex.Pattern; + +import org.apache.spark.SparkConf; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.api.java.function.Function; +import org.apache.spark.ml.clustering.KMeansModel; +import org.apache.spark.mllib.linalg.Vector; +import org.apache.spark.mllib.linalg.VectorUDT; +import org.apache.spark.mllib.linalg.Vectors; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.catalyst.expressions.GenericRow; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +/** + * An example demonstrating a k-means clustering. + * Run with + * pre + * bin/run-example ml.JavaSimpleParamsExample file k + * /pre + */ +public class JavaKMeansExample { + + private static class ParsePoint implements FunctionString, Row { +private static Pattern separater = Pattern.compile( ); --- End diff -- Nit: The spelling should still be `separator` and it can be `final`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35619529 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml; + +import java.util.regex.Pattern; + +import org.apache.spark.SparkConf; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.api.java.function.Function; +import org.apache.spark.ml.clustering.KMeansModel; +import org.apache.spark.mllib.linalg.Vector; +import org.apache.spark.mllib.linalg.VectorUDT; +import org.apache.spark.mllib.linalg.Vectors; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.catalyst.expressions.GenericRow; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +/** + * An example demonstrating a k-means clustering. + * Run with + * pre + * bin/run-example ml.JavaSimpleParamsExample file k + * /pre + */ +public class JavaKMeansExample { + + private static class ParsePoint implements FunctionString, Row { +private static Pattern separater = Pattern.compile( ); + +@Override +public Row call(String line) { + String[] tok = separater.split(line); + double[] point = new double[tok.length]; + for (int i = 0; i tok.length; ++i) { +point[i] = Double.parseDouble(tok[i]); + } + Vector[] points = {Vectors.dense(point)}; + Row row = new GenericRow(points); --- End diff -- While we're here, this can be returned directly, skipping a local var --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35619542 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml; + +import java.util.regex.Pattern; + +import org.apache.spark.SparkConf; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.api.java.function.Function; +import org.apache.spark.ml.clustering.KMeansModel; +import org.apache.spark.mllib.linalg.Vector; +import org.apache.spark.mllib.linalg.VectorUDT; +import org.apache.spark.mllib.linalg.Vectors; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.catalyst.expressions.GenericRow; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +/** + * An example demonstrating a k-means clustering. + * Run with + * pre + * bin/run-example ml.JavaSimpleParamsExample file k + * /pre + */ +public class JavaKMeansExample { + + private static class ParsePoint implements FunctionString, Row { +private static Pattern separater = Pattern.compile( ); + +@Override +public Row call(String line) { + String[] tok = separater.split(line); + double[] point = new double[tok.length]; + for (int i = 0; i tok.length; ++i) { +point[i] = Double.parseDouble(tok[i]); + } + Vector[] points = {Vectors.dense(point)}; + Row row = new GenericRow(points); + return row; +} + } + + public static void main(String[] args) { +if (args.length != 2) { + System.err.println(Usage: ml.JavaKMeansExample file k); + System.exit(1); +} +String inputFile = args[0]; +int k = Integer.parseInt(args[1]); + +// Parses the arguments +SparkConf conf = new SparkConf().setAppName(JavaKMeansExample); +JavaSparkContext jsc = new JavaSparkContext(conf); +SQLContext sqlContext = new SQLContext(jsc); + +// Loads data +JavaRDDRow points = jsc.textFile(inputFile).map(new ParsePoint()); +StructField[] fields = new StructField[1]; +fields[0] = new StructField(features, new VectorUDT(), false, Metadata.empty()); --- End diff -- You can use the same `Foo[] = { ... };` declaration as above here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35619559 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml; + +import java.util.regex.Pattern; + +import org.apache.spark.SparkConf; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.api.java.function.Function; +import org.apache.spark.ml.clustering.KMeansModel; +import org.apache.spark.mllib.linalg.Vector; +import org.apache.spark.mllib.linalg.VectorUDT; +import org.apache.spark.mllib.linalg.Vectors; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.catalyst.expressions.GenericRow; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +/** + * An example demonstrating a k-means clustering. + * Run with + * pre + * bin/run-example ml.JavaSimpleParamsExample file k + * /pre + */ +public class JavaKMeansExample { + + private static class ParsePoint implements FunctionString, Row { +private static Pattern separater = Pattern.compile( ); + +@Override +public Row call(String line) { + String[] tok = separater.split(line); + double[] point = new double[tok.length]; + for (int i = 0; i tok.length; ++i) { +point[i] = Double.parseDouble(tok[i]); + } + Vector[] points = {Vectors.dense(point)}; + Row row = new GenericRow(points); + return row; +} + } + + public static void main(String[] args) { +if (args.length != 2) { + System.err.println(Usage: ml.JavaKMeansExample file k); + System.exit(1); +} +String inputFile = args[0]; +int k = Integer.parseInt(args[1]); + +// Parses the arguments +SparkConf conf = new SparkConf().setAppName(JavaKMeansExample); +JavaSparkContext jsc = new JavaSparkContext(conf); +SQLContext sqlContext = new SQLContext(jsc); + +// Loads data +JavaRDDRow points = jsc.textFile(inputFile).map(new ParsePoint()); +StructField[] fields = new StructField[1]; +fields[0] = new StructField(features, new VectorUDT(), false, Metadata.empty()); +StructType schema = new StructType(fields); +DataFrame dataset = sqlContext.createDataFrame(points, schema); + +// Trains a k-means model +org.apache.spark.ml.clustering.KMeans kmeans = new org.apache.spark.ml.clustering.KMeans() --- End diff -- Can this be an import or am I missing why it has to be fully qualified here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125849111 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125849122 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125849265 [Test build #38807 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38807/consoleFull) for PR 7697 at commit [`554e574`](https://github.com/apache/spark/commit/554e574646a9ed552cc7d94ac9ece2f8124f8c96). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125226470 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125225163 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125227555 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125227612 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35547864 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml; + +import java.util.regex.Pattern; + +import org.apache.commons.cli.*; --- End diff -- I think we should not introduce a dependency on Commons CLI just for this. It's ancient. Actually, I see one other use of this in an example, which shouldn't be there as this is an undeclared dependency. It's not crazy to fix that here; it could be a separate PR though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35548056 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml; + +import java.util.regex.Pattern; + +import org.apache.commons.cli.*; + +import org.apache.spark.SparkConf; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.api.java.function.Function; +import org.apache.spark.ml.clustering.KMeansModel; +import org.apache.spark.mllib.clustering.KMeans; +import org.apache.spark.mllib.linalg.Vector; +import org.apache.spark.mllib.linalg.VectorUDT; +import org.apache.spark.mllib.linalg.Vectors; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.catalyst.expressions.GenericRow; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +/** + * An example demonstrating a k-means clustering. + * Run with + * {{{ + * bin/run-example ml.JavaSimpleParamsExample [options] -k int -input file + * }}} + */ +public class JavaKMeansExample { + + private static class Params { +String input; +Integer k = 2; +Integer maxIter = 20; +Integer runs = 1; +Double epsilon = 1E-4; +Long seed = 1L; +String initMode = KMeans.K_MEANS_PARALLEL(); +Integer initSteps = 5; + } + + private static class ParsePoint implements FunctionString, Row { +Pattern separater = Pattern.compile( ); --- End diff -- Nit: separator. This should be private static if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35548012 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml; + +import java.util.regex.Pattern; + +import org.apache.commons.cli.*; + +import org.apache.spark.SparkConf; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.api.java.function.Function; +import org.apache.spark.ml.clustering.KMeansModel; +import org.apache.spark.mllib.clustering.KMeans; +import org.apache.spark.mllib.linalg.Vector; +import org.apache.spark.mllib.linalg.VectorUDT; +import org.apache.spark.mllib.linalg.Vectors; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.catalyst.expressions.GenericRow; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +/** + * An example demonstrating a k-means clustering. + * Run with + * {{{ --- End diff -- Unless I'm really out of date, this doesn't work in Javadoc. `{@code ...}` works. Though that is not used in Spark consistently. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125225198 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125236591 [Test build #38551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38551/console) for PR 7697 at commit [`b09ec13`](https://github.com/apache/spark/commit/b09ec134701d18b0710ff3930fd8e1dd4254ab08). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaKMeansExample ` * ` case class Params(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125226098 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125228083 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125227789 [Test build #115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/115/consoleFull) for PR 7697 at commit [`b09ec13`](https://github.com/apache/spark/commit/b09ec134701d18b0710ff3930fd8e1dd4254ab08). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125233978 [Test build #115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/115/console) for PR 7697 at commit [`b09ec13`](https://github.com/apache/spark/commit/b09ec134701d18b0710ff3930fd8e1dd4254ab08). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaKMeansExample ` * ` case class Params(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125234331 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/7697 [SPARK-9149][ML][Examples] Add an example of spark.ml KMeans [SPARK-9149] Add an example of spark.ml KMeans - ASF JIRA https://issues.apache.org/jira/browse/SPARK-9149 @jkbradley Should we support other data formats, such as TSV or CSV. I have implemented these examples which support only space separated file which is same as the example for `spark.mllib`'s `KMeans`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-9149 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7697.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7697 commit b09ec134701d18b0710ff3930fd8e1dd4254ab08 Author: Yu ISHIKAWA yuu.ishik...@gmail.com Date: 2015-07-24T08:59:01Z [SPARK-9149][ML][Examples] Add an example of spark.ml KMeans --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125228153 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125229686 [Test build #38551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38551/consoleFull) for PR 7697 at commit [`b09ec13`](https://github.com/apache/spark/commit/b09ec134701d18b0710ff3930fd8e1dd4254ab08). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125236710 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35598240 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml; + +import java.util.regex.Pattern; + +import org.apache.commons.cli.*; + +import org.apache.spark.SparkConf; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.api.java.function.Function; +import org.apache.spark.ml.clustering.KMeansModel; +import org.apache.spark.mllib.clustering.KMeans; +import org.apache.spark.mllib.linalg.Vector; +import org.apache.spark.mllib.linalg.VectorUDT; +import org.apache.spark.mllib.linalg.Vectors; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.catalyst.expressions.GenericRow; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +/** + * An example demonstrating a k-means clustering. + * Run with + * {{{ --- End diff -- I confused it with Scaladocs. I'll fix it. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/7697#discussion_r35598247 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml; + +import java.util.regex.Pattern; + +import org.apache.commons.cli.*; + +import org.apache.spark.SparkConf; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.api.java.function.Function; +import org.apache.spark.ml.clustering.KMeansModel; +import org.apache.spark.mllib.clustering.KMeans; +import org.apache.spark.mllib.linalg.Vector; +import org.apache.spark.mllib.linalg.VectorUDT; +import org.apache.spark.mllib.linalg.Vectors; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.catalyst.expressions.GenericRow; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +/** + * An example demonstrating a k-means clustering. + * Run with + * {{{ + * bin/run-example ml.JavaSimpleParamsExample [options] -k int -input file + * }}} + */ +public class JavaKMeansExample { + + private static class Params { +String input; +Integer k = 2; +Integer maxIter = 20; +Integer runs = 1; +Double epsilon = 1E-4; +Long seed = 1L; +String initMode = KMeans.K_MEANS_PARALLEL(); +Integer initSteps = 5; + } + + private static class ParsePoint implements FunctionString, Row { +Pattern separater = Pattern.compile( ); --- End diff -- Alright. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125399245 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125401233 [Test build #38619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38619/console) for PR 7697 at commit [`3e0862d`](https://github.com/apache/spark/commit/3e0862d760f72f0026af34fe5b9e9c2889fcf4fd). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaKMeansExample ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125401272 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125399260 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9149][ML][Examples] Add an example of s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7697#issuecomment-125399716 [Test build #38619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38619/consoleFull) for PR 7697 at commit [`3e0862d`](https://github.com/apache/spark/commit/3e0862d760f72f0026af34fe5b9e9c2889fcf4fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org