[ https://issues.apache.org/jira/browse/FLINK-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185087#comment-15185087 ]
ASF GitHub Bot commented on FLINK-3591: --------------------------------------- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1774#discussion_r55375679 --- Diff: docs/quickstart/run_example_quickstart.md --- @@ -27,116 +27,360 @@ under the License. * This will be replaced by the TOC {:toc} -This guide walks you through the steps of executing an example program ([K-Means clustering](http://en.wikipedia.org/wiki/K-means_clustering)) on Flink. -On the way, you will see the a visualization of the program, the optimized execution plan, and track the progress of its execution. +In this guide we will start from scratch and fo from setting up a Flink project and running +a streaming analysis program on a Flink cluster. + +Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to +read this channel in Flink and count the number of bytes that each user edits within +a given window of time. This is easy enough to implement in a few minutes using Flink but it will +give you a good foundation from which to start building more complex analysis programs on your own. + +## Setting up a Maven Project + +We are going to use a Flink Maven Archetype for creating our project stucture. Please +see [Java API Quickstart]({{ site.baseurl }}/quickstart/java_api_quickstart.html) for more details +about this. For our purposes, the command to run is this: + +{% highlight bash %} +$ mvn archetype:generate\ + -DarchetypeGroupId=org.apache.flink\ + -DarchetypeArtifactId=flink-quickstart-java\ + -DarchetypeVersion=1.0.0\ + -DgroupId=wiki-edits\ + -DartifactId=wiki-edits\ + -Dversion=0.1\ + -Dpackage=wikiedits\ + -DinteractiveMode=false\ +{% endhighlight %} + +You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, +maven will create a project structure that looks like this: + +{% highlight bash %} +$ tree wiki-edits +wiki-edits/ +├── pom.xml +└── src + └── main + ├── java + │ └── wikiedits + │ ├── Job.java + │ ├── SocketTextStreamWordCount.java + │ └── WordCount.java + └── resources + └── log4j.properties +{% endhighlight %} + +There is our `pom.xml` file that already has the Flink dependencies added in the root directory and +several example Flink programs in `src/main/java`. We can delete the example programs, since +we are going to start from scratch: + +{% highlight bash %} +$ rm wiki-edits/src/main/java/wikiedits/*.java +{% endhighlight %} + +As a last step we need to add the Flink wikipedia connector as a dependency so that we can +use it in our program. Edit the `dependencies` section so that it looks like this: + +{% highlight xml %} +<dependencies> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-java</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java_2.10</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-clients_2.10</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-wikiedits_2.10</artifactId> + <version>${flink.version}</version> + </dependency> +</dependencies> +{% endhighlight %} + +Notice the `flink-connector-wikiedits_2.10` dependency that was added. + +## Writing a Flink Program + +It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and +create the file `src/main/wikiedits/WikipediaAnalysis.java`: + +{% highlight java %} +package wikiedits; + +public class WikipediaAnalysis { + + public static void main(String[] args) throws Exception { + + } +} +{% endhighlight %} + +I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give +import statements here since IDEs can add them automatically. At the end of this section I'll show +the complete code with import statements if you simply want to skip ahead and enter that in your +editor. + +The first step in a Flink program is to create a `StreamExecutionEnvironment` +(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution +parameters and create sources for reading from external systems. So let's go ahead, add +this to the main method: + +{% highlight java %} +StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); +{% endhighlight %} + +Next, we will create a source that reads from the Wikipedia IRC log: + +{% highlight java %} +DataStream<WikipediaEditEvent> edits = see.addSource(new WikipediaEditsSource()); +{% endhighlight %} + +This creates a `DataStream` of `WikipediaEditEvent` elements that we can further process. For +the purposes of this example we are interrested in determining the number of added or removed --- End diff -- typo interested > Replace Quickstart K-Means Example by Streaming Example > ------------------------------------------------------- > > Key: FLINK-3591 > URL: https://issues.apache.org/jira/browse/FLINK-3591 > Project: Flink > Issue Type: Improvement > Components: Documentation > Reporter: Aljoscha Krettek > Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)