Repository: incubator-flink Updated Branches: refs/heads/master 6b6dae026 -> 7af127eac
Updated run-example quickstart (commands, screenshots) This closes #136 Project: http://git-wip-us.apache.org/repos/asf/incubator-flink/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-flink/commit/7af127ea Tree: http://git-wip-us.apache.org/repos/asf/incubator-flink/tree/7af127ea Diff: http://git-wip-us.apache.org/repos/asf/incubator-flink/diff/7af127ea Branch: refs/heads/master Commit: 7af127eac10db3bd794b2406da14423ae0d21cd9 Parents: 6b6dae0 Author: Fabian Hueske <[email protected]> Authored: Wed Oct 1 14:41:29 2014 +0200 Committer: Fabian Hueske <[email protected]> Committed: Tue Oct 7 23:12:01 2014 +0200 ---------------------------------------------------------------------- .../compiler-webclient-new.png | Bin 192965 -> 123539 bytes .../jobmanager-running-new.png | Bin 143924 -> 123944 bytes docs/img/quickstart-example/kmeans003.png | Bin 71309 -> 27962 bytes docs/img/quickstart-example/kmeans008.png | Bin 91857 -> 39305 bytes docs/img/quickstart-example/kmeans015.png | Bin 95171 -> 41958 bytes docs/img/quickstart-example/result003.png | Bin 57838 -> 60228 bytes docs/img/quickstart-example/result008.png | Bin 82928 -> 92732 bytes docs/img/quickstart-example/result015.png | Bin 88338 -> 89724 bytes docs/img/quickstart-example/run-webclient.png | Bin 84682 -> 120068 bytes docs/run_example_quickstart.md | 78 ++++++++++--------- 10 files changed, 43 insertions(+), 35 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/7af127ea/docs/img/quickstart-example/compiler-webclient-new.png ---------------------------------------------------------------------- diff --git a/docs/img/quickstart-example/compiler-webclient-new.png b/docs/img/quickstart-example/compiler-webclient-new.png index 1141de1..a39bc6a 100644 Binary files a/docs/img/quickstart-example/compiler-webclient-new.png and b/docs/img/quickstart-example/compiler-webclient-new.png differ http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/7af127ea/docs/img/quickstart-example/jobmanager-running-new.png ---------------------------------------------------------------------- diff --git a/docs/img/quickstart-example/jobmanager-running-new.png b/docs/img/quickstart-example/jobmanager-running-new.png index 5bcf9e1..ac9594c 100644 Binary files a/docs/img/quickstart-example/jobmanager-running-new.png and b/docs/img/quickstart-example/jobmanager-running-new.png differ http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/7af127ea/docs/img/quickstart-example/kmeans003.png ---------------------------------------------------------------------- diff --git a/docs/img/quickstart-example/kmeans003.png b/docs/img/quickstart-example/kmeans003.png index ab9a61b..32f8dbb 100644 Binary files a/docs/img/quickstart-example/kmeans003.png and b/docs/img/quickstart-example/kmeans003.png differ http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/7af127ea/docs/img/quickstart-example/kmeans008.png ---------------------------------------------------------------------- diff --git a/docs/img/quickstart-example/kmeans008.png b/docs/img/quickstart-example/kmeans008.png index c2e2b81..b372fd1 100644 Binary files a/docs/img/quickstart-example/kmeans008.png and b/docs/img/quickstart-example/kmeans008.png differ http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/7af127ea/docs/img/quickstart-example/kmeans015.png ---------------------------------------------------------------------- diff --git a/docs/img/quickstart-example/kmeans015.png b/docs/img/quickstart-example/kmeans015.png index 3f0873a..8b6fb51 100644 Binary files a/docs/img/quickstart-example/kmeans015.png and b/docs/img/quickstart-example/kmeans015.png differ http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/7af127ea/docs/img/quickstart-example/result003.png ---------------------------------------------------------------------- diff --git a/docs/img/quickstart-example/result003.png b/docs/img/quickstart-example/result003.png index 0b3c502..bdcef44 100644 Binary files a/docs/img/quickstart-example/result003.png and b/docs/img/quickstart-example/result003.png differ http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/7af127ea/docs/img/quickstart-example/result008.png ---------------------------------------------------------------------- diff --git a/docs/img/quickstart-example/result008.png b/docs/img/quickstart-example/result008.png index fe215ad..921c73c 100644 Binary files a/docs/img/quickstart-example/result008.png and b/docs/img/quickstart-example/result008.png differ http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/7af127ea/docs/img/quickstart-example/result015.png ---------------------------------------------------------------------- diff --git a/docs/img/quickstart-example/result015.png b/docs/img/quickstart-example/result015.png index d0428ac..9dbc6c4 100644 Binary files a/docs/img/quickstart-example/result015.png and b/docs/img/quickstart-example/result015.png differ http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/7af127ea/docs/img/quickstart-example/run-webclient.png ---------------------------------------------------------------------- diff --git a/docs/img/quickstart-example/run-webclient.png b/docs/img/quickstart-example/run-webclient.png index e86bbe4..3dfb9ca 100644 Binary files a/docs/img/quickstart-example/run-webclient.png and b/docs/img/quickstart-example/run-webclient.png differ http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/7af127ea/docs/run_example_quickstart.md ---------------------------------------------------------------------- diff --git a/docs/run_example_quickstart.md b/docs/run_example_quickstart.md index 7fabaab..40fcb95 100644 --- a/docs/run_example_quickstart.md +++ b/docs/run_example_quickstart.md @@ -5,22 +5,20 @@ title: "Quick Start: Run K-Means Example" * This will be replaced by the TOC {:toc} -This guide will demonstrate Flink's features by example. You will see how you can leverage Flink's Iteration-feature to find clusters in a dataset using [K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering). -On the way, you will see the compiler, the status interface and the result of the algorithm. +This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +## Setup Flink +Follow the [instructions](setup_quickstart.html) to setup Flink and enter the root directory of your Flink setup. -## Generate Input Data +## Generate Input Data Flink contains a data generator for K-Means. ~~~bash -# Download Flink -wget {{ site.FLINK_DOWNLOAD_URL_HADOOP_1_STABLE }} -tar xzf flink-*.tgz -cd flink-* +# Assuming you are in the root directory of your Flink setup mkdir kmeans cd kmeans # Run data generator -java -cp ../examples/flink-java-examples-{{ site.FLINK_VERSION_STABLE }}-KMeans.jar org.apache.flink.example.java.clustering.util.KMeansDataGenerator 500 10 0.08 +java -cp ../examples/flink-java-examples-*-KMeans.jar org.apache.flink.examples.java.clustering.util.KMeansDataGenerator 500 10 0.08 cp /tmp/points . cp /tmp/centers . ~~~ @@ -31,19 +29,18 @@ The generator has the following arguments: KMeansDataGenerator <numberOfDataPoints> <numberOfClusterCenters> [<relative stddev>] [<centroid range>] [<seed>] ~~~ -The _relative standard deviation_ is an interesting tuning parameter: it determines the closeness of the points to the centers. +The _relative standard deviation_ is an interesting tuning parameter. It determines the closeness of the points to randomly generated centers. -The `kmeans/` directory should now contain two files: `centers` and `points`. +The `kmeans/` directory should now contain two files: `centers` and `points`. The `points` file contains the points to cluster and the `centers` file contains initial cluster centers. -## Review Input Data -Use the `plotPoints.py` tool to review the result of the data generator. [Download Python Script](quickstart/plotPoints.py) +## Inspect the Input Data +Use the `plotPoints.py` tool to review the generated data points. [Download Python Script](quickstart/plotPoints.py) ~~~ bash -python plotPoints.py points points input +python plotPoints.py points ./points input ~~~ - Note: You might have to install [matplotlib](http://matplotlib.org/) (`python-matplotlib` package on Ubuntu) to use the Python script. You can review the input data stored in the `input-plot.pdf`, for example with Evince (`evince input-plot.pdf`). @@ -55,37 +52,39 @@ The following overview presents the impact of the different standard deviations |<img src="img/quickstart-example/kmeans003.png" alt="example1" style="width: 275px;"/>|<img src="img/quickstart-example/kmeans008.png" alt="example2" style="width: 275px;"/>|<img src="img/quickstart-example/kmeans015.png" alt="example3" style="width: 275px;"/>| -## Run Clustering -We are using the generated input data to run the clustering using a Flink job. +## Start Flink +Start Flink and the web job submission client on your local machine. - # go to the Flink-root directory - cd flink - # start Flink (use ./bin/start-cluster.sh if you're on a cluster) - ./bin/start-local.sh - # Start Flink web client - ./bin/start-webclient.sh +~~~ bash +# return to the Flink root directory +cd .. +# start Flink +./bin/start-local.sh +# Start the web client +./bin/start-webclient.sh +~~~ -## Review Flink Compiler -The Flink webclient allows to submit Flink programs using a graphical user interface. +## Inspect and Run the K-Means Example Program +The Flink web client allows to submit Flink programs using a graphical user interface. <div class="row" style="padding-top:15px"> <div class="col-md-6"> <a data-lightbox="compiler" href="img/quickstart-example/run-webclient.png" data-lightbox="example-1"><img class="img-responsive" src="img/quickstart-example/run-webclient.png" /></a> </div> <div class="col-md-6"> - 1. <a href="http://localhost:8080/launch.html">Open webclient on localhost:8080</a> <br> - 2. Upload the file. + 1. Open web client on <a href="http://localhost:8080/launch.html">localhost:8080</a> <br> + 2. Upload the K-Mean job JAR file. {% highlight bash %} - examples/flink-java-examples-{{ site.FLINK_VERSION_STABLE }}-KMeans.jar + ./examples/flink-java-examples-*-KMeans.jar {% endhighlight %} </br> 3. Select it in the left box to see how the operators in the plan are connected to each other. <br> 4. Enter the arguments in the lower left box: {% highlight bash %} - file://<pathToGenerated>points file://<pathToGenerated>centers file://<pathToGenerated>result 10 + file://<pathToFlink>/kmeans/points file://<pathToFlink>/kmeans/centers file://<pathToFlink>/kmeans/result 10 {% endhighlight %} For example: {% highlight bash %} - file:///tmp/flink/kmeans/points file:///tmp/flink/kmeans/centers file:///tmp/flink/kmeans/result 20 + file:///tmp/flink/kmeans/points file:///tmp/flink/kmeans/centers file:///tmp/flink/kmeans/result 10 {% endhighlight %} </div> </div> @@ -96,7 +95,7 @@ The Flink webclient allows to submit Flink programs using a graphical user inter </div> <div class="col-md-6"> - 1. Press the <b>RunJob</b> to see the optimzer plan. <br> + 1. Press the <b>RunJob</b> to see the optimizer plan. <br> 2. Inspect the operators and see the properties (input sizes, cost estimation) determined by the optimizer. </div> </div> @@ -107,18 +106,27 @@ The Flink webclient allows to submit Flink programs using a graphical user inter </div> <div class="col-md-6"> 1. Press the <b>Continue</b> button to start executing the job. <br> - 2. <a href="http://localhost:8080/launch.html">Open Flink's monitoring interface</a> to see the job's progress.<br> - 3. Once the job has finished, you can analyize the runtime of the individual operators. + 2. <a href="http://localhost:8080/launch.html">Open Flink's monitoring interface</a> to see the job's progress. (Due to the small input data, the job will finish really quick!)<br> + 3. Once the job has finished, you can analyze the runtime of the individual operators. </div> </div> +## Shutdown Flink +Stop Flink when you are done. -## Analyze the Result +~~~ bash +# stop Flink +./bin/stop-local.sh +# Stop the Flink web client +./bin/stop-webclient.sh +~~~ -Use the [Python Script](quickstart/plotPoints.py) again to visualize the result +## Analyze the Result +Use the [Python Script](quickstart/plotPoints.py) again to visualize the result. ~~~bash -python plotPoints.py result result result-pdf +cd kmeans +python plotPoints.py result ./result clusters ~~~ The following three pictures show the results for the sample input above. Play around with the parameters (number of iterations, number of clusters) to see how they affect the result.
