[ https://issues.apache.org/jira/browse/FLINK-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111848#comment-15111848 ]
ASF GitHub Bot commented on FLINK-2021: --------------------------------------- Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1536#discussion_r50498429 --- Diff: flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/clustering/KMeans.scala --- @@ -26,53 +27,84 @@ import org.apache.flink.examples.java.clustering.util.KMeansData import scala.collection.JavaConverters._ /** - * This example implements a basic K-Means clustering algorithm. - * - * K-Means is an iterative clustering algorithm and works as follows: - * K-Means is given a set of data points to be clustered and an initial set of ''K'' cluster - * centers. - * In each iteration, the algorithm computes the distance of each data point to each cluster center. - * Each point is assigned to the cluster center which is closest to it. - * Subsequently, each cluster center is moved to the center (''mean'') of all points that have - * been assigned to it. - * The moved cluster centers are fed into the next iteration. - * The algorithm terminates after a fixed number of iterations (as in this implementation) - * or if cluster centers do not (significantly) move in an iteration. - * This is the Wikipedia entry for the [[http://en.wikipedia - * .org/wiki/K-means_clustering K-Means Clustering algorithm]]. - * - * This implementation works on two-dimensional data points. - * It computes an assignment of data points to cluster centers, i.e., - * each data point is annotated with the id of the final cluster (center) it belongs to. - * - * Input files are plain text files and must be formatted as follows: - * - * - Data points are represented as two double values separated by a blank character. - * Data points are separated by newline characters. - * For example `"1.2 2.3\n5.3 7.2\n"` gives two data points (x=1.2, y=2.3) and (x=5.3, - * y=7.2). - * - Cluster centers are represented by an integer id and a point value. - * For example `"1 6.2 3.2\n2 2.9 5.7\n"` gives two centers (id=1, x=6.2, - * y=3.2) and (id=2, x=2.9, y=5.7). - * - * Usage: - * {{{ - * KMeans <points path> <centers path> <result path> <num iterations> - * }}} - * If no parameters are provided, the program is run with default data from - * [[org.apache.flink.examples.java.clustering.util.KMeansData]] - * and 10 iterations. - * - * This example shows how to use: - * - * - Bulk iterations - * - Broadcast variables in bulk iterations - * - Custom Java objects (PoJos) - */ + * This example implements a basic K-Means clustering algorithm. + * + * K-Means is an iterative clustering algorithm and works as follows: + * K-Means is given a set of data points to be clustered and an initial set of ''K'' cluster + * centers. + * In each iteration, the algorithm computes the distance of each data point to each cluster center. + * Each point is assigned to the cluster center which is closest to it. + * Subsequently, each cluster center is moved to the center (''mean'') of all points that have + * been assigned to it. + * The moved cluster centers are fed into the next iteration. + * The algorithm terminates after a fixed number of iterations (as in this implementation) + * or if cluster centers do not (significantly) move in an iteration. + * This is the Wikipedia entry for the [[http://en.wikipedia + * .org/wiki/K-means_clustering K-Means Clustering algorithm]]. + * + * This implementation works on two-dimensional data points. + * It computes an assignment of data points to cluster centers, i.e., + * each data point is annotated with the id of the final cluster (center) it belongs to. + * + * Input files are plain text files and must be formatted as follows: + * + * - Data points are represented as two double values separated by a blank character. + * Data points are separated by newline characters. + * For example `"1.2 2.3\n5.3 7.2\n"` gives two data points (x=1.2, y=2.3) and (x=5.3, + * y=7.2). + * - Cluster centers are represented by an integer id and a point value. + * For example `"1 6.2 3.2\n2 2.9 5.7\n"` gives two centers (id=1, x=6.2, + * y=3.2) and (id=2, x=2.9, y=5.7). + * + * Usage: + * {{{ + * KMeans <points path> <centers path> <result path> <num iterations> + * }}} + * If no parameters are provided, the program is run with default data from + * [[org.apache.flink.examples.java.clustering.util.KMeansData]] + * and 10 iterations. + * + * This example shows how to use: + * + * - Bulk iterations + * - Broadcast variables in bulk iterations + * - Custom Java objects (PoJos) --- End diff -- We're using "Scala objects". Could you change this line? > Rework examples to use ParameterTool > ------------------------------------ > > Key: FLINK-2021 > URL: https://issues.apache.org/jira/browse/FLINK-2021 > Project: Flink > Issue Type: Improvement > Components: Examples > Affects Versions: 0.9 > Reporter: Robert Metzger > Priority: Minor > Labels: starter > > In FLINK-1525, we introduced the {{ParameterTool}}. > We should port the examples to use the tool. > The examples could look like this (we should maybe discuss it first on the > mailing lists): > {code} > public static void main(String[] args) throws Exception { > ParameterTool pt = ParameterTool.fromArgs(args); > boolean fileOutput = pt.getNumberOfParameters() == 2; > String textPath = null; > String outputPath = null; > if(fileOutput) { > textPath = pt.getRequired("input"); > outputPath = pt.getRequired("output"); > } > // set up the execution environment > final ExecutionEnvironment env = > ExecutionEnvironment.getExecutionEnvironment(); > env.getConfig().setUserConfig(pt); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)