[
https://issues.apache.org/jira/browse/FLINK-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111848#comment-15111848
]
ASF GitHub Bot commented on FLINK-2021:
---------------------------------------
Github user chiwanpark commented on a diff in the pull request:
https://github.com/apache/flink/pull/1536#discussion_r50498429
--- Diff:
flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/clustering/KMeans.scala
---
@@ -26,53 +27,84 @@ import
org.apache.flink.examples.java.clustering.util.KMeansData
import scala.collection.JavaConverters._
/**
- * This example implements a basic K-Means clustering algorithm.
- *
- * K-Means is an iterative clustering algorithm and works as follows:
- * K-Means is given a set of data points to be clustered and an initial
set of ''K'' cluster
- * centers.
- * In each iteration, the algorithm computes the distance of each data
point to each cluster center.
- * Each point is assigned to the cluster center which is closest to it.
- * Subsequently, each cluster center is moved to the center (''mean'') of
all points that have
- * been assigned to it.
- * The moved cluster centers are fed into the next iteration.
- * The algorithm terminates after a fixed number of iterations (as in this
implementation)
- * or if cluster centers do not (significantly) move in an iteration.
- * This is the Wikipedia entry for the [[http://en.wikipedia
- * .org/wiki/K-means_clustering K-Means Clustering algorithm]].
- *
- * This implementation works on two-dimensional data points.
- * It computes an assignment of data points to cluster centers, i.e.,
- * each data point is annotated with the id of the final cluster (center)
it belongs to.
- *
- * Input files are plain text files and must be formatted as follows:
- *
- * - Data points are represented as two double values separated by a
blank character.
- * Data points are separated by newline characters.
- * For example `"1.2 2.3\n5.3 7.2\n"` gives two data points (x=1.2,
y=2.3) and (x=5.3,
- * y=7.2).
- * - Cluster centers are represented by an integer id and a point value.
- * For example `"1 6.2 3.2\n2 2.9 5.7\n"` gives two centers (id=1,
x=6.2,
- * y=3.2) and (id=2, x=2.9, y=5.7).
- *
- * Usage:
- * {{{
- * KMeans <points path> <centers path> <result path> <num iterations>
- * }}}
- * If no parameters are provided, the program is run with default data from
- * [[org.apache.flink.examples.java.clustering.util.KMeansData]]
- * and 10 iterations.
- *
- * This example shows how to use:
- *
- * - Bulk iterations
- * - Broadcast variables in bulk iterations
- * - Custom Java objects (PoJos)
- */
+ * This example implements a basic K-Means clustering algorithm.
+ *
+ * K-Means is an iterative clustering algorithm and works as follows:
+ * K-Means is given a set of data points to be clustered and an initial
set of ''K'' cluster
+ * centers.
+ * In each iteration, the algorithm computes the distance of each data
point to each cluster center.
+ * Each point is assigned to the cluster center which is closest to it.
+ * Subsequently, each cluster center is moved to the center (''mean'') of
all points that have
+ * been assigned to it.
+ * The moved cluster centers are fed into the next iteration.
+ * The algorithm terminates after a fixed number of iterations (as in
this implementation)
+ * or if cluster centers do not (significantly) move in an iteration.
+ * This is the Wikipedia entry for the [[http://en.wikipedia
+ * .org/wiki/K-means_clustering K-Means Clustering algorithm]].
+ *
+ * This implementation works on two-dimensional data points.
+ * It computes an assignment of data points to cluster centers, i.e.,
+ * each data point is annotated with the id of the final cluster (center)
it belongs to.
+ *
+ * Input files are plain text files and must be formatted as follows:
+ *
+ * - Data points are represented as two double values separated by a
blank character.
+ * Data points are separated by newline characters.
+ * For example `"1.2 2.3\n5.3 7.2\n"` gives two data points (x=1.2,
y=2.3) and (x=5.3,
+ * y=7.2).
+ * - Cluster centers are represented by an integer id and a point value.
+ * For example `"1 6.2 3.2\n2 2.9 5.7\n"` gives two centers (id=1, x=6.2,
+ * y=3.2) and (id=2, x=2.9, y=5.7).
+ *
+ * Usage:
+ * {{{
+ * KMeans <points path> <centers path> <result path> <num iterations>
+ * }}}
+ * If no parameters are provided, the program is run with default data
from
+ * [[org.apache.flink.examples.java.clustering.util.KMeansData]]
+ * and 10 iterations.
+ *
+ * This example shows how to use:
+ *
+ * - Bulk iterations
+ * - Broadcast variables in bulk iterations
+ * - Custom Java objects (PoJos)
--- End diff --
We're using "Scala objects". Could you change this line?
> Rework examples to use ParameterTool
> ------------------------------------
>
> Key: FLINK-2021
> URL: https://issues.apache.org/jira/browse/FLINK-2021
> Project: Flink
> Issue Type: Improvement
> Components: Examples
> Affects Versions: 0.9
> Reporter: Robert Metzger
> Priority: Minor
> Labels: starter
>
> In FLINK-1525, we introduced the {{ParameterTool}}.
> We should port the examples to use the tool.
> The examples could look like this (we should maybe discuss it first on the
> mailing lists):
> {code}
> public static void main(String[] args) throws Exception {
> ParameterTool pt = ParameterTool.fromArgs(args);
> boolean fileOutput = pt.getNumberOfParameters() == 2;
> String textPath = null;
> String outputPath = null;
> if(fileOutput) {
> textPath = pt.getRequired("input");
> outputPath = pt.getRequired("output");
> }
> // set up the execution environment
> final ExecutionEnvironment env =
> ExecutionEnvironment.getExecutionEnvironment();
> env.getConfig().setUserConfig(pt);
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)