This is an automated email from the ASF dual-hosted git repository. sjwiesman pushed a commit to branch release-1.11 in repository https://gitbox.apache.org/repos/asf/flink.git
commit 0861d2e586e5202c992b8a15ec23ca71d56dea62 Author: Seth Wiesman <sjwies...@gmail.com> AuthorDate: Mon Jun 8 11:22:15 2020 -0500 [FLINK-17980][docs] Move project setup into DataStream section --- docs/dev/project-configuration.md | 559 +++++++++++++++++++++ docs/dev/project-configuration.zh.md | 559 +++++++++++++++++++++ docs/getting-started/project-setup/dependencies.md | 237 --------- .../project-setup/dependencies.zh.md | 200 -------- .../project-setup/java_api_quickstart.md | 375 -------------- .../project-setup/java_api_quickstart.zh.md | 360 ------------- .../project-setup/scala_api_quickstart.md | 249 --------- .../project-setup/scala_api_quickstart.zh.md | 241 --------- docs/redirects/dependencies.md | 2 +- ...ndencies.md => getting-started-dependencies.md} | 6 +- .../index.md => redirects/java-quickstart.md} | 9 +- .../index.zh.md => redirects/scala-quickstart.md} | 9 +- docs/redirects/scala_quickstart.md | 2 +- 13 files changed, 1131 insertions(+), 1677 deletions(-) diff --git a/docs/dev/project-configuration.md b/docs/dev/project-configuration.md new file mode 100644 index 0000000..a23b134 --- /dev/null +++ b/docs/dev/project-configuration.md @@ -0,0 +1,559 @@ +--- +title: "Project Configuration" +nav-parent_id: streaming +nav-pos: 301 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Every Flink application depends on a set of Flink libraries. At the bare minimum, the application depends +on the Flink APIs. Many applications depend in addition on certain connector libraries (like Kafka, Cassandra, etc.). +When running Flink applications (either in a distributed deployment, or in the IDE for testing), the Flink +runtime library must be available as well. + +* This will be replaced by the TOC +{:toc} + +## Flink Core and Application Dependencies + +As with most systems that run user-defined applications, there are two broad categories of dependencies and libraries in Flink: + + - **Flink Core Dependencies**: Flink itself consists of a set of classes and dependencies that are needed to run the system, for example + coordination, networking, checkpoints, failover, APIs, operations (such as windowing), resource management, etc. + The set of all these classes and dependencies forms the core of Flink's runtime and must be present when a Flink + application is started. + + These core classes and dependencies are packaged in the `flink-dist` jar. They are part of Flink's `lib` folder and + part of the basic Flink container images. Think of these dependencies as similar to Java's core library (`rt.jar`, `charsets.jar`, etc.), + which contains the classes like `String` and `List`. + + The Flink Core Dependencies do not contain any connectors or libraries (CEP, SQL, ML, etc.) in order to avoid having an excessive + number of dependencies and classes in the classpath by default. In fact, we try to keep the core dependencies as slim as possible + to keep the default classpath small and avoid dependency clashes. + + - The **User Application Dependencies** are all connectors, formats, or libraries that a specific user application needs. + + The user application is typically packaged into an *application jar*, which contains the application code and the required + connector and library dependencies. + + The user application dependencies explicitly do not include the Flink DataStream APIs and runtime dependencies, + because those are already part of Flink's Core Dependencies. + + +## Setting up a Project: Basic Dependencies + +Every Flink application needs as the bare minimum the API dependencies, to develop against. + +When setting up a project manually, you need to add the following dependencies for the Java/Scala API +(here presented in Maven syntax, but the same dependencies apply to other build tools (Gradle, SBT, etc.) as well. + +<div class="codetabs" markdown="1"> +<div data-lang="java" markdown="1"> +{% highlight xml %} +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java{{ site.scala_version_suffix }}</artifactId> + <version>{{site.version }}</version> + <scope>provided</scope> +</dependency> +{% endhighlight %} +</div> +<div data-lang="scala" markdown="1"> +{% highlight xml %} +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-scala{{ site.scala_version_suffix }}</artifactId> + <version>{{site.version }}</version> + <scope>provided</scope> +</dependency> +{% endhighlight %} +</div> +</div> + +**Important:** Please note that all these dependencies have their scope set to *provided*. +That means that they are needed to compile against, but that they should not be packaged into the +project's resulting application jar file - these dependencies are Flink Core Dependencies, +which are already available in any setup. + +It is highly recommended keeping the dependencies in scope *provided*. If they are not set to *provided*, +the best case is that the resulting JAR becomes excessively large, because it also contains all Flink core +dependencies. The worst case is that the Flink core dependencies that are added to the application's jar file +clash with some of your own dependency versions (which is normally avoided through inverted classloading). + +**Note on IntelliJ:** To make the applications run within IntelliJ IDEA it is necessary to tick the +`Include dependencies with "Provided" scope` box in the run configuration. +If this option is not available (possibly due to using an older IntelliJ IDEA version), then a simple workaround +is to create a test that calls the applications `main()` method. + + +## Adding Connector and Library Dependencies + +Most applications need specific connectors or libraries to run, for example a connector to Kafka, Cassandra, etc. +These connectors are not part of Flink's core dependencies and must be added as dependencies to the application. + +Below is an example adding the connector for Kafka as a dependency (Maven syntax): +{% highlight xml %} +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-kafka{{ site.scala_version_suffix }}</artifactId> + <version>{{site.version }}</version> +</dependency> +{% endhighlight %} + +We recommend packaging the application code and all its required dependencies into one *jar-with-dependencies* which +we refer to as the *application jar*. The application jar can be submitted to an already running Flink cluster, +or added to a Flink application container image. + +Projects created from the [Java Project Template]({{ site.baseurl }}/dev/projectsetup/java_api_quickstart.html) or +[Scala Project Template]({{ site.baseurl }}/dev/projectsetup/scala_api_quickstart.html) are configured to automatically include +the application dependencies into the application jar when running `mvn clean package`. For projects that are +not set up from those templates, we recommend adding the Maven Shade Plugin (as listed in the Appendix below) +to build the application jar with all required dependencies. + +**Important:** For Maven (and other build tools) to correctly package the dependencies into the application jar, +these application dependencies must be specified in scope *compile* (unlike the core dependencies, which +must be specified in scope *provided*). + + +## Scala Versions + +Scala versions (2.11, 2.12, etc.) are not binary compatible with one another. +For that reason, Flink for Scala 2.11 cannot be used with an application that uses +Scala 2.12. + +All Flink dependencies that (transitively) depend on Scala are suffixed with the +Scala version that they are built for, for example `flink-streaming-scala_2.11`. + +Developers that only use Java can pick any Scala version, Scala developers need to +pick the Scala version that matches their application's Scala version. + +Please refer to the [build guide]({{ site.baseurl }}/flinkDev/building.html#scala-versions) +for details on how to build Flink for a specific Scala version. + +## Hadoop Dependencies + +**General rule: It should never be necessary to add Hadoop dependencies directly to your application.** +*(The only exception being when using existing Hadoop input-/output formats with Flink's Hadoop compatibility wrappers)* + +If you want to use Flink with Hadoop, you need to have a Flink setup that includes the Hadoop dependencies, rather than +adding Hadoop as an application dependency. Please refer to the [Hadoop Setup Guide]({{ site.baseurl }}/ops/deployment/hadoop.html) +for details. + +There are two main reasons for that design: + + - Some Hadoop interaction happens in Flink's core, possibly before the user application is started, for example + setting up HDFS for checkpoints, authenticating via Hadoop's Kerberos tokens, or deployment on YARN. + + - Flink's inverted classloading approach hides many transitive dependencies from the core dependencies. That applies not only + to Flink's own core dependencies, but also to Hadoop's dependencies when present in the setup. + That way, applications can use different versions of the same dependencies without running into dependency conflicts (and + trust us, that's a big deal, because Hadoops dependency tree is huge.) + +If you need Hadoop dependencies during testing or development inside the IDE (for example for HDFS access), please configure +these dependencies similar to the scope of the dependencies to *test* or to *provided*. + +## Maven Quickstart + +#### Requirements + +The only requirements are working __Maven 3.0.4__ (or higher) and __Java 8.x__ installations. + +#### Create Project + +Use one of the following commands to __create a project__: + +<ul class="nav nav-tabs" style="border-bottom: none;"> + <li class="active"><a href="#maven-archetype" data-toggle="tab">Use <strong>Maven archetypes</strong></a></li> + <li><a href="#quickstart-script" data-toggle="tab">Run the <strong>quickstart script</strong></a></li> +</ul> +<div class="tab-content"> + <div class="tab-pane active" id="maven-archetype"> +{% highlight bash %} +$ mvn archetype:generate \ + -DarchetypeGroupId=org.apache.flink \ + -DarchetypeArtifactId=flink-quickstart-java \{% unless site.is_stable %} + -DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/ \{% endunless %} + -DarchetypeVersion={{site.version}} +{% endhighlight %} + This allows you to <strong>name your newly created project</strong>. It will interactively ask you for the groupId, artifactId, and package name. + </div> + <div class="tab-pane" id="quickstart-script"> +{% highlight bash %} +{% if site.is_stable %} +$ curl https://flink.apache.org/q/quickstart.sh | bash -s {{site.version}} +{% else %} +$ curl https://flink.apache.org/q/quickstart-SNAPSHOT.sh | bash -s {{site.version}} +{% endif %} +{% endhighlight %} + + </div> + {% unless site.is_stable %} + <p style="border-radius: 5px; padding: 5px" class="bg-danger"> + <b>Note</b>: For Maven 3.0 or higher, it is no longer possible to specify the repository (-DarchetypeCatalog) via the command line. For details about this change, please refer to <a href="http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html">Maven official document</a> + If you wish to use the snapshot repository, you need to add a repository entry to your settings.xml. For example: +{% highlight bash %} +<settings> + <activeProfiles> + <activeProfile>apache</activeProfile> + </activeProfiles> + <profiles> + <profile> + <id>apache</id> + <repositories> + <repository> + <id>apache-snapshots</id> + <url>https://repository.apache.org/content/repositories/snapshots/</url> + </repository> + </repositories> + </profile> + </profiles> +</settings> +{% endhighlight %} + </p> + {% endunless %} +</div> + +We recommend you __import this project into your IDE__ to develop and +test it. IntelliJ IDEA supports Maven projects out of the box. +If you use Eclipse, the [m2e plugin](http://www.eclipse.org/m2e/) +allows to [import Maven projects](http://books.sonatype.com/m2eclipse-book/reference/creating-sect-importing-projects.html#fig-creating-import). +Some Eclipse bundles include that plugin by default, others require you +to install it manually. + +*Please note*: The default JVM heapsize for Java may be too +small for Flink. You have to manually increase it. +In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM Arguments` box: `-Xmx800m`. +In IntelliJ IDEA recommended way to change JVM options is from the `Help | Edit Custom VM Options` menu. See [this article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties) for details. + +#### Build Project + +If you want to __build/package your project__, go to your project directory and +run the '`mvn clean package`' command. +You will __find a JAR file__ that contains your application, plus connectors and libraries +that you may have added as dependencies to the application: `target/<artifact-id>-<version>.jar`. + +__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point, +we recommend you change the `mainClass` setting in the `pom.xml` file accordingly. That way, Flink +can run the application from the JAR file without additionally specifying the main class. + +## Gradle + +#### Requirements + +The only requirements are working __Gradle 3.x__ (or higher) and __Java 8.x__ installations. + +#### Create Project + +Use one of the following commands to __create a project__: + +<ul class="nav nav-tabs" style="border-bottom: none;"> + <li class="active"><a href="#gradle-example" data-toggle="tab"><strong>Gradle example</strong></a></li> + <li><a href="#gradle-script" data-toggle="tab">Run the <strong>quickstart script</strong></a></li> +</ul> +<div class="tab-content"> + <div class="tab-pane active" id="gradle-example"> + + <ul class="nav nav-tabs" style="border-bottom: none;"> + <li class="active"><a href="#gradle-build" data-toggle="tab"><tt>build.gradle</tt></a></li> + <li><a href="#gradle-settings" data-toggle="tab"><tt>settings.gradle</tt></a></li> + </ul> + <div class="tab-content"> +<!-- NOTE: Any change to the build scripts here should also be reflected in flink-web/q/gradle-quickstart.sh !! --> + <div class="tab-pane active" id="gradle-build"> +{% highlight gradle %} +buildscript { + repositories { + jcenter() // this applies only to the Gradle 'Shadow' plugin + } + dependencies { + classpath 'com.github.jengelman.gradle.plugins:shadow:2.0.4' + } +} + +plugins { + id 'java' + id 'application' + // shadow plugin to produce fat JARs + id 'com.github.johnrengelman.shadow' version '2.0.4' +} + + +// artifact properties +group = 'org.myorg.quickstart' +version = '0.1-SNAPSHOT' +mainClassName = 'org.myorg.quickstart.StreamingJob' +description = """Flink Quickstart Job""" + +ext { + javaVersion = '1.8' + flinkVersion = '{{ site.version }}' + scalaBinaryVersion = '{{ site.scala_version }}' + slf4jVersion = '1.7.15' + log4jVersion = '2.12.1' +} + + +sourceCompatibility = javaVersion +targetCompatibility = javaVersion +tasks.withType(JavaCompile) { + options.encoding = 'UTF-8' +} + +applicationDefaultJvmArgs = ["-Dlog4j.configurationFile=log4j2.properties"] + +task wrapper(type: Wrapper) { + gradleVersion = '3.1' +} + +// declare where to find the dependencies of your project +repositories { + mavenCentral() + maven { url "https://repository.apache.org/content/repositories/snapshots/" } +} + +// NOTE: We cannot use "compileOnly" or "shadow" configurations since then we could not run code +// in the IDE or with "gradle run". We also cannot exclude transitive dependencies from the +// shadowJar yet (see https://github.com/johnrengelman/shadow/issues/159). +// -> Explicitly define the // libraries we want to be included in the "flinkShadowJar" configuration! +configurations { + flinkShadowJar // dependencies which go into the shadowJar + + // always exclude these (also from transitive dependencies) since they are provided by Flink + flinkShadowJar.exclude group: 'org.apache.flink', module: 'force-shading' + flinkShadowJar.exclude group: 'com.google.code.findbugs', module: 'jsr305' + flinkShadowJar.exclude group: 'org.slf4j' + flinkShadowJar.exclude group: 'org.apache.logging.log4j' +} + +// declare the dependencies for your production and test code +dependencies { + // -------------------------------------------------------------- + // Compile-time dependencies that should NOT be part of the + // shadow jar and are provided in the lib folder of Flink + // -------------------------------------------------------------- + compile "org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}" + + // -------------------------------------------------------------- + // Dependencies that should be part of the shadow jar, e.g. + // connectors. These must be in the flinkShadowJar configuration! + // -------------------------------------------------------------- + //flinkShadowJar "org.apache.flink:flink-connector-kafka-0.11_${scalaBinaryVersion}:${flinkVersion}" + + compile "org.apache.logging.log4j:log4j-api:${log4jVersion}" + compile "org.apache.logging.log4j:log4j-core:${log4jVersion}" + compile "org.apache.logging.log4j:log4j-slf4j-impl:${log4jVersion}" + compile "org.slf4j:slf4j-log4j12:${slf4jVersion}" + + // Add test dependencies here. + // testCompile "junit:junit:4.12" +} + +// make compileOnly dependencies available for tests: +sourceSets { + main.compileClasspath += configurations.flinkShadowJar + main.runtimeClasspath += configurations.flinkShadowJar + + test.compileClasspath += configurations.flinkShadowJar + test.runtimeClasspath += configurations.flinkShadowJar + + javadoc.classpath += configurations.flinkShadowJar +} + +run.classpath = sourceSets.main.runtimeClasspath + +jar { + manifest { + attributes 'Built-By': System.getProperty('user.name'), + 'Build-Jdk': System.getProperty('java.version') + } +} + +shadowJar { + configurations = [project.configurations.flinkShadowJar] +} +{% endhighlight %} + </div> + <div class="tab-pane" id="gradle-settings"> +{% highlight gradle %} +rootProject.name = 'quickstart' +{% endhighlight %} + </div> + </div> + </div> + + <div class="tab-pane" id="gradle-script"> +{% highlight bash %} +bash -c "$(curl https://flink.apache.org/q/gradle-quickstart.sh)" -- {{site.version}} {{site.scala_version}} +{% endhighlight %} + This allows you to <strong>name your newly created project</strong>. It will interactively ask + you for the project name, organization (also used for the package name), project version, + Scala and Flink version. + </div> +</div> + +We recommend you __import this project into your IDE__ to develop and +test it. IntelliJ IDEA supports Gradle projects after installing the `Gradle` plugin. +Eclipse does so via the [Eclipse Buildship](https://projects.eclipse.org/projects/tools.buildship) plugin +(make sure to specify a Gradle version >= 3.0 in the last step of the import wizard; the `shadow` plugin requires it). +You may also use [Gradle's IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) +to create project files from Gradle. + + +*Please note*: The default JVM heapsize for Java may be too +small for Flink. You have to manually increase it. +In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM Arguments` box: `-Xmx800m`. +In IntelliJ IDEA recommended way to change JVM options is from the `Help | Edit Custom VM Options` menu. See [this article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties) for details. + +#### Build Project + +If you want to __build/package your project__, go to your project directory and +run the '`gradle clean shadowJar`' command. +You will __find a JAR file__ that contains your application, plus connectors and libraries +that you may have added as dependencies to the application: `build/libs/<project-name>-<version>-all.jar`. + +__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point, +we recommend you change the `mainClassName` setting in the `build.gradle` file accordingly. That way, Flink +can run the application from the JAR file without additionally specifying the main class. + +## SBT + +#### Create Project + +You can scaffold a new project via either of the following two methods: + +<ul class="nav nav-tabs" style="border-bottom: none;"> + <li class="active"><a href="#sbt_template" data-toggle="tab">Use the <strong>sbt template</strong></a></li> + <li><a href="#quickstart-script-sbt" data-toggle="tab">Run the <strong>quickstart script</strong></a></li> +</ul> + +<div class="tab-content"> + <div class="tab-pane active" id="sbt_template"> +{% highlight bash %} +$ sbt new tillrohrmann/flink-project.g8 +{% endhighlight %} + This will prompt you for a couple of parameters (project name, flink version...) and then create a Flink project from the <a href="https://github.com/tillrohrmann/flink-project.g8">flink-project template</a>. + You need sbt >= 0.13.13 to execute this command. You can follow this <a href="http://www.scala-sbt.org/download.html">installation guide</a> to obtain it if necessary. + </div> + <div class="tab-pane" id="quickstart-script-sbt"> +{% highlight bash %} +$ bash <(curl https://flink.apache.org/q/sbt-quickstart.sh) +{% endhighlight %} + This will create a Flink project in the <strong>specified</strong> project directory. + </div> +</div> + +#### Build Project + +In order to build your project you simply have to issue the `sbt clean assembly` command. +This will create the fat-jar __your-project-name-assembly-0.1-SNAPSHOT.jar__ in the directory __target/scala_your-major-scala-version/__. + +#### Run Project + +In order to run your project you have to issue the `sbt run` command. + +Per default, this will run your job in the same JVM as `sbt` is running. +In order to run your job in a distinct JVM, add the following line to `build.sbt` + +{% highlight scala %} +fork in run := true +{% endhighlight %} + + +#### IntelliJ + +We recommend using [IntelliJ](https://www.jetbrains.com/idea/) for your Flink job development. +In order to get started, you have to import your newly created project into IntelliJ. +You can do this via `File -> New -> Project from Existing Sources...` and then choosing your project's directory. +IntelliJ will then automatically detect the `build.sbt` file and set everything up. + +In order to run your Flink job, it is recommended to choose the `mainRunner` module as the classpath of your __Run/Debug Configuration__. +This will ensure, that all dependencies which are set to _provided_ will be available upon execution. +You can configure the __Run/Debug Configurations__ via `Run -> Edit Configurations...` and then choose `mainRunner` from the _Use classpath of module_ dropbox. + +#### Eclipse + +In order to import the newly created project into [Eclipse](https://eclipse.org/), you first have to create Eclipse project files for it. +These project files can be created via the [sbteclipse](https://github.com/typesafehub/sbteclipse) plugin. +Add the following line to your `PROJECT_DIR/project/plugins.sbt` file: + +{% highlight bash %} +addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "4.0.0") +{% endhighlight %} + +In `sbt` use the following command to create the Eclipse project files + +{% highlight bash %} +> eclipse +{% endhighlight %} + +Now you can import the project into Eclipse via `File -> Import... -> Existing Projects into Workspace` and then select the project directory. + + +## Appendix: Template for building a Jar with Dependencies + +To build an application JAR that contains all dependencies required for declared connectors and libraries, +you can use the following shade plugin definition: + +{% highlight xml %} +<build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-shade-plugin</artifactId> + <version>3.1.1</version> + <executions> + <execution> + <phase>package</phase> + <goals> + <goal>shade</goal> + </goals> + <configuration> + <artifactSet> + <excludes> + <exclude>com.google.code.findbugs:jsr305</exclude> + <exclude>org.slf4j:*</exclude> + <exclude>log4j:*</exclude> + </excludes> + </artifactSet> + <filters> + <filter> + <!-- Do not copy the signatures in the META-INF folder. + Otherwise, this might cause SecurityExceptions when using the JAR. --> + <artifact>*:*</artifact> + <excludes> + <exclude>META-INF/*.SF</exclude> + <exclude>META-INF/*.DSA</exclude> + <exclude>META-INF/*.RSA</exclude> + </excludes> + </filter> + </filters> + <transformers> + <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> + <mainClass>my.programs.main.clazz</mainClass> + </transformer> + </transformers> + </configuration> + </execution> + </executions> + </plugin> + </plugins> +</build> +{% endhighlight %} + +{% top %} diff --git a/docs/dev/project-configuration.zh.md b/docs/dev/project-configuration.zh.md new file mode 100644 index 0000000..a23b134 --- /dev/null +++ b/docs/dev/project-configuration.zh.md @@ -0,0 +1,559 @@ +--- +title: "Project Configuration" +nav-parent_id: streaming +nav-pos: 301 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Every Flink application depends on a set of Flink libraries. At the bare minimum, the application depends +on the Flink APIs. Many applications depend in addition on certain connector libraries (like Kafka, Cassandra, etc.). +When running Flink applications (either in a distributed deployment, or in the IDE for testing), the Flink +runtime library must be available as well. + +* This will be replaced by the TOC +{:toc} + +## Flink Core and Application Dependencies + +As with most systems that run user-defined applications, there are two broad categories of dependencies and libraries in Flink: + + - **Flink Core Dependencies**: Flink itself consists of a set of classes and dependencies that are needed to run the system, for example + coordination, networking, checkpoints, failover, APIs, operations (such as windowing), resource management, etc. + The set of all these classes and dependencies forms the core of Flink's runtime and must be present when a Flink + application is started. + + These core classes and dependencies are packaged in the `flink-dist` jar. They are part of Flink's `lib` folder and + part of the basic Flink container images. Think of these dependencies as similar to Java's core library (`rt.jar`, `charsets.jar`, etc.), + which contains the classes like `String` and `List`. + + The Flink Core Dependencies do not contain any connectors or libraries (CEP, SQL, ML, etc.) in order to avoid having an excessive + number of dependencies and classes in the classpath by default. In fact, we try to keep the core dependencies as slim as possible + to keep the default classpath small and avoid dependency clashes. + + - The **User Application Dependencies** are all connectors, formats, or libraries that a specific user application needs. + + The user application is typically packaged into an *application jar*, which contains the application code and the required + connector and library dependencies. + + The user application dependencies explicitly do not include the Flink DataStream APIs and runtime dependencies, + because those are already part of Flink's Core Dependencies. + + +## Setting up a Project: Basic Dependencies + +Every Flink application needs as the bare minimum the API dependencies, to develop against. + +When setting up a project manually, you need to add the following dependencies for the Java/Scala API +(here presented in Maven syntax, but the same dependencies apply to other build tools (Gradle, SBT, etc.) as well. + +<div class="codetabs" markdown="1"> +<div data-lang="java" markdown="1"> +{% highlight xml %} +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java{{ site.scala_version_suffix }}</artifactId> + <version>{{site.version }}</version> + <scope>provided</scope> +</dependency> +{% endhighlight %} +</div> +<div data-lang="scala" markdown="1"> +{% highlight xml %} +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-scala{{ site.scala_version_suffix }}</artifactId> + <version>{{site.version }}</version> + <scope>provided</scope> +</dependency> +{% endhighlight %} +</div> +</div> + +**Important:** Please note that all these dependencies have their scope set to *provided*. +That means that they are needed to compile against, but that they should not be packaged into the +project's resulting application jar file - these dependencies are Flink Core Dependencies, +which are already available in any setup. + +It is highly recommended keeping the dependencies in scope *provided*. If they are not set to *provided*, +the best case is that the resulting JAR becomes excessively large, because it also contains all Flink core +dependencies. The worst case is that the Flink core dependencies that are added to the application's jar file +clash with some of your own dependency versions (which is normally avoided through inverted classloading). + +**Note on IntelliJ:** To make the applications run within IntelliJ IDEA it is necessary to tick the +`Include dependencies with "Provided" scope` box in the run configuration. +If this option is not available (possibly due to using an older IntelliJ IDEA version), then a simple workaround +is to create a test that calls the applications `main()` method. + + +## Adding Connector and Library Dependencies + +Most applications need specific connectors or libraries to run, for example a connector to Kafka, Cassandra, etc. +These connectors are not part of Flink's core dependencies and must be added as dependencies to the application. + +Below is an example adding the connector for Kafka as a dependency (Maven syntax): +{% highlight xml %} +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-kafka{{ site.scala_version_suffix }}</artifactId> + <version>{{site.version }}</version> +</dependency> +{% endhighlight %} + +We recommend packaging the application code and all its required dependencies into one *jar-with-dependencies* which +we refer to as the *application jar*. The application jar can be submitted to an already running Flink cluster, +or added to a Flink application container image. + +Projects created from the [Java Project Template]({{ site.baseurl }}/dev/projectsetup/java_api_quickstart.html) or +[Scala Project Template]({{ site.baseurl }}/dev/projectsetup/scala_api_quickstart.html) are configured to automatically include +the application dependencies into the application jar when running `mvn clean package`. For projects that are +not set up from those templates, we recommend adding the Maven Shade Plugin (as listed in the Appendix below) +to build the application jar with all required dependencies. + +**Important:** For Maven (and other build tools) to correctly package the dependencies into the application jar, +these application dependencies must be specified in scope *compile* (unlike the core dependencies, which +must be specified in scope *provided*). + + +## Scala Versions + +Scala versions (2.11, 2.12, etc.) are not binary compatible with one another. +For that reason, Flink for Scala 2.11 cannot be used with an application that uses +Scala 2.12. + +All Flink dependencies that (transitively) depend on Scala are suffixed with the +Scala version that they are built for, for example `flink-streaming-scala_2.11`. + +Developers that only use Java can pick any Scala version, Scala developers need to +pick the Scala version that matches their application's Scala version. + +Please refer to the [build guide]({{ site.baseurl }}/flinkDev/building.html#scala-versions) +for details on how to build Flink for a specific Scala version. + +## Hadoop Dependencies + +**General rule: It should never be necessary to add Hadoop dependencies directly to your application.** +*(The only exception being when using existing Hadoop input-/output formats with Flink's Hadoop compatibility wrappers)* + +If you want to use Flink with Hadoop, you need to have a Flink setup that includes the Hadoop dependencies, rather than +adding Hadoop as an application dependency. Please refer to the [Hadoop Setup Guide]({{ site.baseurl }}/ops/deployment/hadoop.html) +for details. + +There are two main reasons for that design: + + - Some Hadoop interaction happens in Flink's core, possibly before the user application is started, for example + setting up HDFS for checkpoints, authenticating via Hadoop's Kerberos tokens, or deployment on YARN. + + - Flink's inverted classloading approach hides many transitive dependencies from the core dependencies. That applies not only + to Flink's own core dependencies, but also to Hadoop's dependencies when present in the setup. + That way, applications can use different versions of the same dependencies without running into dependency conflicts (and + trust us, that's a big deal, because Hadoops dependency tree is huge.) + +If you need Hadoop dependencies during testing or development inside the IDE (for example for HDFS access), please configure +these dependencies similar to the scope of the dependencies to *test* or to *provided*. + +## Maven Quickstart + +#### Requirements + +The only requirements are working __Maven 3.0.4__ (or higher) and __Java 8.x__ installations. + +#### Create Project + +Use one of the following commands to __create a project__: + +<ul class="nav nav-tabs" style="border-bottom: none;"> + <li class="active"><a href="#maven-archetype" data-toggle="tab">Use <strong>Maven archetypes</strong></a></li> + <li><a href="#quickstart-script" data-toggle="tab">Run the <strong>quickstart script</strong></a></li> +</ul> +<div class="tab-content"> + <div class="tab-pane active" id="maven-archetype"> +{% highlight bash %} +$ mvn archetype:generate \ + -DarchetypeGroupId=org.apache.flink \ + -DarchetypeArtifactId=flink-quickstart-java \{% unless site.is_stable %} + -DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/ \{% endunless %} + -DarchetypeVersion={{site.version}} +{% endhighlight %} + This allows you to <strong>name your newly created project</strong>. It will interactively ask you for the groupId, artifactId, and package name. + </div> + <div class="tab-pane" id="quickstart-script"> +{% highlight bash %} +{% if site.is_stable %} +$ curl https://flink.apache.org/q/quickstart.sh | bash -s {{site.version}} +{% else %} +$ curl https://flink.apache.org/q/quickstart-SNAPSHOT.sh | bash -s {{site.version}} +{% endif %} +{% endhighlight %} + + </div> + {% unless site.is_stable %} + <p style="border-radius: 5px; padding: 5px" class="bg-danger"> + <b>Note</b>: For Maven 3.0 or higher, it is no longer possible to specify the repository (-DarchetypeCatalog) via the command line. For details about this change, please refer to <a href="http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html">Maven official document</a> + If you wish to use the snapshot repository, you need to add a repository entry to your settings.xml. For example: +{% highlight bash %} +<settings> + <activeProfiles> + <activeProfile>apache</activeProfile> + </activeProfiles> + <profiles> + <profile> + <id>apache</id> + <repositories> + <repository> + <id>apache-snapshots</id> + <url>https://repository.apache.org/content/repositories/snapshots/</url> + </repository> + </repositories> + </profile> + </profiles> +</settings> +{% endhighlight %} + </p> + {% endunless %} +</div> + +We recommend you __import this project into your IDE__ to develop and +test it. IntelliJ IDEA supports Maven projects out of the box. +If you use Eclipse, the [m2e plugin](http://www.eclipse.org/m2e/) +allows to [import Maven projects](http://books.sonatype.com/m2eclipse-book/reference/creating-sect-importing-projects.html#fig-creating-import). +Some Eclipse bundles include that plugin by default, others require you +to install it manually. + +*Please note*: The default JVM heapsize for Java may be too +small for Flink. You have to manually increase it. +In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM Arguments` box: `-Xmx800m`. +In IntelliJ IDEA recommended way to change JVM options is from the `Help | Edit Custom VM Options` menu. See [this article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties) for details. + +#### Build Project + +If you want to __build/package your project__, go to your project directory and +run the '`mvn clean package`' command. +You will __find a JAR file__ that contains your application, plus connectors and libraries +that you may have added as dependencies to the application: `target/<artifact-id>-<version>.jar`. + +__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point, +we recommend you change the `mainClass` setting in the `pom.xml` file accordingly. That way, Flink +can run the application from the JAR file without additionally specifying the main class. + +## Gradle + +#### Requirements + +The only requirements are working __Gradle 3.x__ (or higher) and __Java 8.x__ installations. + +#### Create Project + +Use one of the following commands to __create a project__: + +<ul class="nav nav-tabs" style="border-bottom: none;"> + <li class="active"><a href="#gradle-example" data-toggle="tab"><strong>Gradle example</strong></a></li> + <li><a href="#gradle-script" data-toggle="tab">Run the <strong>quickstart script</strong></a></li> +</ul> +<div class="tab-content"> + <div class="tab-pane active" id="gradle-example"> + + <ul class="nav nav-tabs" style="border-bottom: none;"> + <li class="active"><a href="#gradle-build" data-toggle="tab"><tt>build.gradle</tt></a></li> + <li><a href="#gradle-settings" data-toggle="tab"><tt>settings.gradle</tt></a></li> + </ul> + <div class="tab-content"> +<!-- NOTE: Any change to the build scripts here should also be reflected in flink-web/q/gradle-quickstart.sh !! --> + <div class="tab-pane active" id="gradle-build"> +{% highlight gradle %} +buildscript { + repositories { + jcenter() // this applies only to the Gradle 'Shadow' plugin + } + dependencies { + classpath 'com.github.jengelman.gradle.plugins:shadow:2.0.4' + } +} + +plugins { + id 'java' + id 'application' + // shadow plugin to produce fat JARs + id 'com.github.johnrengelman.shadow' version '2.0.4' +} + + +// artifact properties +group = 'org.myorg.quickstart' +version = '0.1-SNAPSHOT' +mainClassName = 'org.myorg.quickstart.StreamingJob' +description = """Flink Quickstart Job""" + +ext { + javaVersion = '1.8' + flinkVersion = '{{ site.version }}' + scalaBinaryVersion = '{{ site.scala_version }}' + slf4jVersion = '1.7.15' + log4jVersion = '2.12.1' +} + + +sourceCompatibility = javaVersion +targetCompatibility = javaVersion +tasks.withType(JavaCompile) { + options.encoding = 'UTF-8' +} + +applicationDefaultJvmArgs = ["-Dlog4j.configurationFile=log4j2.properties"] + +task wrapper(type: Wrapper) { + gradleVersion = '3.1' +} + +// declare where to find the dependencies of your project +repositories { + mavenCentral() + maven { url "https://repository.apache.org/content/repositories/snapshots/" } +} + +// NOTE: We cannot use "compileOnly" or "shadow" configurations since then we could not run code +// in the IDE or with "gradle run". We also cannot exclude transitive dependencies from the +// shadowJar yet (see https://github.com/johnrengelman/shadow/issues/159). +// -> Explicitly define the // libraries we want to be included in the "flinkShadowJar" configuration! +configurations { + flinkShadowJar // dependencies which go into the shadowJar + + // always exclude these (also from transitive dependencies) since they are provided by Flink + flinkShadowJar.exclude group: 'org.apache.flink', module: 'force-shading' + flinkShadowJar.exclude group: 'com.google.code.findbugs', module: 'jsr305' + flinkShadowJar.exclude group: 'org.slf4j' + flinkShadowJar.exclude group: 'org.apache.logging.log4j' +} + +// declare the dependencies for your production and test code +dependencies { + // -------------------------------------------------------------- + // Compile-time dependencies that should NOT be part of the + // shadow jar and are provided in the lib folder of Flink + // -------------------------------------------------------------- + compile "org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}" + + // -------------------------------------------------------------- + // Dependencies that should be part of the shadow jar, e.g. + // connectors. These must be in the flinkShadowJar configuration! + // -------------------------------------------------------------- + //flinkShadowJar "org.apache.flink:flink-connector-kafka-0.11_${scalaBinaryVersion}:${flinkVersion}" + + compile "org.apache.logging.log4j:log4j-api:${log4jVersion}" + compile "org.apache.logging.log4j:log4j-core:${log4jVersion}" + compile "org.apache.logging.log4j:log4j-slf4j-impl:${log4jVersion}" + compile "org.slf4j:slf4j-log4j12:${slf4jVersion}" + + // Add test dependencies here. + // testCompile "junit:junit:4.12" +} + +// make compileOnly dependencies available for tests: +sourceSets { + main.compileClasspath += configurations.flinkShadowJar + main.runtimeClasspath += configurations.flinkShadowJar + + test.compileClasspath += configurations.flinkShadowJar + test.runtimeClasspath += configurations.flinkShadowJar + + javadoc.classpath += configurations.flinkShadowJar +} + +run.classpath = sourceSets.main.runtimeClasspath + +jar { + manifest { + attributes 'Built-By': System.getProperty('user.name'), + 'Build-Jdk': System.getProperty('java.version') + } +} + +shadowJar { + configurations = [project.configurations.flinkShadowJar] +} +{% endhighlight %} + </div> + <div class="tab-pane" id="gradle-settings"> +{% highlight gradle %} +rootProject.name = 'quickstart' +{% endhighlight %} + </div> + </div> + </div> + + <div class="tab-pane" id="gradle-script"> +{% highlight bash %} +bash -c "$(curl https://flink.apache.org/q/gradle-quickstart.sh)" -- {{site.version}} {{site.scala_version}} +{% endhighlight %} + This allows you to <strong>name your newly created project</strong>. It will interactively ask + you for the project name, organization (also used for the package name), project version, + Scala and Flink version. + </div> +</div> + +We recommend you __import this project into your IDE__ to develop and +test it. IntelliJ IDEA supports Gradle projects after installing the `Gradle` plugin. +Eclipse does so via the [Eclipse Buildship](https://projects.eclipse.org/projects/tools.buildship) plugin +(make sure to specify a Gradle version >= 3.0 in the last step of the import wizard; the `shadow` plugin requires it). +You may also use [Gradle's IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) +to create project files from Gradle. + + +*Please note*: The default JVM heapsize for Java may be too +small for Flink. You have to manually increase it. +In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM Arguments` box: `-Xmx800m`. +In IntelliJ IDEA recommended way to change JVM options is from the `Help | Edit Custom VM Options` menu. See [this article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties) for details. + +#### Build Project + +If you want to __build/package your project__, go to your project directory and +run the '`gradle clean shadowJar`' command. +You will __find a JAR file__ that contains your application, plus connectors and libraries +that you may have added as dependencies to the application: `build/libs/<project-name>-<version>-all.jar`. + +__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point, +we recommend you change the `mainClassName` setting in the `build.gradle` file accordingly. That way, Flink +can run the application from the JAR file without additionally specifying the main class. + +## SBT + +#### Create Project + +You can scaffold a new project via either of the following two methods: + +<ul class="nav nav-tabs" style="border-bottom: none;"> + <li class="active"><a href="#sbt_template" data-toggle="tab">Use the <strong>sbt template</strong></a></li> + <li><a href="#quickstart-script-sbt" data-toggle="tab">Run the <strong>quickstart script</strong></a></li> +</ul> + +<div class="tab-content"> + <div class="tab-pane active" id="sbt_template"> +{% highlight bash %} +$ sbt new tillrohrmann/flink-project.g8 +{% endhighlight %} + This will prompt you for a couple of parameters (project name, flink version...) and then create a Flink project from the <a href="https://github.com/tillrohrmann/flink-project.g8">flink-project template</a>. + You need sbt >= 0.13.13 to execute this command. You can follow this <a href="http://www.scala-sbt.org/download.html">installation guide</a> to obtain it if necessary. + </div> + <div class="tab-pane" id="quickstart-script-sbt"> +{% highlight bash %} +$ bash <(curl https://flink.apache.org/q/sbt-quickstart.sh) +{% endhighlight %} + This will create a Flink project in the <strong>specified</strong> project directory. + </div> +</div> + +#### Build Project + +In order to build your project you simply have to issue the `sbt clean assembly` command. +This will create the fat-jar __your-project-name-assembly-0.1-SNAPSHOT.jar__ in the directory __target/scala_your-major-scala-version/__. + +#### Run Project + +In order to run your project you have to issue the `sbt run` command. + +Per default, this will run your job in the same JVM as `sbt` is running. +In order to run your job in a distinct JVM, add the following line to `build.sbt` + +{% highlight scala %} +fork in run := true +{% endhighlight %} + + +#### IntelliJ + +We recommend using [IntelliJ](https://www.jetbrains.com/idea/) for your Flink job development. +In order to get started, you have to import your newly created project into IntelliJ. +You can do this via `File -> New -> Project from Existing Sources...` and then choosing your project's directory. +IntelliJ will then automatically detect the `build.sbt` file and set everything up. + +In order to run your Flink job, it is recommended to choose the `mainRunner` module as the classpath of your __Run/Debug Configuration__. +This will ensure, that all dependencies which are set to _provided_ will be available upon execution. +You can configure the __Run/Debug Configurations__ via `Run -> Edit Configurations...` and then choose `mainRunner` from the _Use classpath of module_ dropbox. + +#### Eclipse + +In order to import the newly created project into [Eclipse](https://eclipse.org/), you first have to create Eclipse project files for it. +These project files can be created via the [sbteclipse](https://github.com/typesafehub/sbteclipse) plugin. +Add the following line to your `PROJECT_DIR/project/plugins.sbt` file: + +{% highlight bash %} +addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "4.0.0") +{% endhighlight %} + +In `sbt` use the following command to create the Eclipse project files + +{% highlight bash %} +> eclipse +{% endhighlight %} + +Now you can import the project into Eclipse via `File -> Import... -> Existing Projects into Workspace` and then select the project directory. + + +## Appendix: Template for building a Jar with Dependencies + +To build an application JAR that contains all dependencies required for declared connectors and libraries, +you can use the following shade plugin definition: + +{% highlight xml %} +<build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-shade-plugin</artifactId> + <version>3.1.1</version> + <executions> + <execution> + <phase>package</phase> + <goals> + <goal>shade</goal> + </goals> + <configuration> + <artifactSet> + <excludes> + <exclude>com.google.code.findbugs:jsr305</exclude> + <exclude>org.slf4j:*</exclude> + <exclude>log4j:*</exclude> + </excludes> + </artifactSet> + <filters> + <filter> + <!-- Do not copy the signatures in the META-INF folder. + Otherwise, this might cause SecurityExceptions when using the JAR. --> + <artifact>*:*</artifact> + <excludes> + <exclude>META-INF/*.SF</exclude> + <exclude>META-INF/*.DSA</exclude> + <exclude>META-INF/*.RSA</exclude> + </excludes> + </filter> + </filters> + <transformers> + <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> + <mainClass>my.programs.main.clazz</mainClass> + </transformer> + </transformers> + </configuration> + </execution> + </executions> + </plugin> + </plugins> +</build> +{% endhighlight %} + +{% top %} diff --git a/docs/getting-started/project-setup/dependencies.md b/docs/getting-started/project-setup/dependencies.md deleted file mode 100644 index 51f1dd0..0000000 --- a/docs/getting-started/project-setup/dependencies.md +++ /dev/null @@ -1,237 +0,0 @@ ---- -title: "Configuring Dependencies, Connectors, Libraries" -nav-parent_id: project-setup -nav-pos: 2 ---- -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -Every Flink application depends on a set of Flink libraries. At the bare minimum, the application depends -on the Flink APIs. Many applications depend in addition on certain connector libraries (like Kafka, Cassandra, etc.). -When running Flink applications (either in a distributed deployment, or in the IDE for testing), the Flink -runtime library must be available as well. - - -## Flink Core and Application Dependencies - -As with most systems that run user-defined applications, there are two broad categories of dependencies and libraries in Flink: - - - **Flink Core Dependencies**: Flink itself consists of a set of classes and dependencies that are needed to run the system, for example - coordination, networking, checkpoints, failover, APIs, operations (such as windowing), resource management, etc. - The set of all these classes and dependencies forms the core of Flink's runtime and must be present when a Flink - application is started. - - These core classes and dependencies are packaged in the `flink-dist` jar. They are part of Flink's `lib` folder and - part of the basic Flink container images. Think of these dependencies as similar to Java's core library (`rt.jar`, `charsets.jar`, etc.), - which contains the classes like `String` and `List`. - - The Flink Core Dependencies do not contain any connectors or libraries (CEP, SQL, ML, etc.) in order to avoid having an excessive - number of dependencies and classes in the classpath by default. In fact, we try to keep the core dependencies as slim as possible - to keep the default classpath small and avoid dependency clashes. - - - The **User Application Dependencies** are all connectors, formats, or libraries that a specific user application needs. - - The user application is typically packaged into an *application jar*, which contains the application code and the required - connector and library dependencies. - - The user application dependencies explicitly do not include the Flink DataSet / DataStream APIs and runtime dependencies, - because those are already part of Flink's Core Dependencies. - - -## Setting up a Project: Basic Dependencies - -Every Flink application needs as the bare minimum the API dependencies, to develop against. -For Maven, you can use the [Java Project Template]({{ site.baseurl }}/dev/projectsetup/java_api_quickstart.html) -or [Scala Project Template]({{ site.baseurl }}/dev/projectsetup/scala_api_quickstart.html) to create -a program skeleton with these initial dependencies. - -When setting up a project manually, you need to add the following dependencies for the Java/Scala API -(here presented in Maven syntax, but the same dependencies apply to other build tools (Gradle, SBT, etc.) as well. - -<div class="codetabs" markdown="1"> -<div data-lang="java" markdown="1"> -{% highlight xml %} -<dependency> - <groupId>org.apache.flink</groupId> - <artifactId>flink-java</artifactId> - <version>{{site.version }}</version> - <scope>provided</scope> -</dependency> -<dependency> - <groupId>org.apache.flink</groupId> - <artifactId>flink-streaming-java{{ site.scala_version_suffix }}</artifactId> - <version>{{site.version }}</version> - <scope>provided</scope> -</dependency> -{% endhighlight %} -</div> -<div data-lang="scala" markdown="1"> -{% highlight xml %} -<dependency> - <groupId>org.apache.flink</groupId> - <artifactId>flink-scala{{ site.scala_version_suffix }}</artifactId> - <version>{{site.version }}</version> - <scope>provided</scope> -</dependency> -<dependency> - <groupId>org.apache.flink</groupId> - <artifactId>flink-streaming-scala{{ site.scala_version_suffix }}</artifactId> - <version>{{site.version }}</version> - <scope>provided</scope> -</dependency> -{% endhighlight %} -</div> -</div> - -**Important:** Please note that all these dependencies have their scope set to *provided*. -That means that they are needed to compile against, but that they should not be packaged into the -project's resulting application jar file - these dependencies are Flink Core Dependencies, -which are already available in any setup. - -It is highly recommended to keep the dependencies in scope *provided*. If they are not set to *provided*, -the best case is that the resulting JAR becomes excessively large, because it also contains all Flink core -dependencies. The worst case is that the Flink core dependencies that are added to the application's jar file -clash with some of your own dependency versions (which is normally avoided through inverted classloading). - -**Note on IntelliJ:** To make the applications run within IntelliJ IDEA it is necessary to tick the -`Include dependencies with "Provided" scope` box in the run configuration. -If this option is not available (possibly due to using an older IntelliJ IDEA version), then a simple workaround -is to create a test that calls the applications `main()` method. - - -## Adding Connector and Library Dependencies - -Most applications need specific connectors or libraries to run, for example a connector to Kafka, Cassandra, etc. -These connectors are not part of Flink's core dependencies and must hence be added as dependencies to the application - -Below is an example adding the connector for Kafka 0.10 as a dependency (Maven syntax): -{% highlight xml %} -<dependency> - <groupId>org.apache.flink</groupId> - <artifactId>flink-connector-kafka-0.10{{ site.scala_version_suffix }}</artifactId> - <version>{{site.version }}</version> -</dependency> -{% endhighlight %} - -We recommend to package the application code and all its required dependencies into one *jar-with-dependencies* which -we refer to as the *application jar*. The application jar can be submitted to an already running Flink cluster, -or added to a Flink application container image. - -Projects created from the [Java Project Template]({{ site.baseurl }}/dev/projectsetup/java_api_quickstart.html) or -[Scala Project Template]({{ site.baseurl }}/dev/projectsetup/scala_api_quickstart.html) are configured to automatically include -the application dependencies into the application jar when running `mvn clean package`. For projects that are -not set up from those templates, we recommend to add the Maven Shade Plugin (as listed in the Appendix below) -to build the application jar with all required dependencies. - -**Important:** For Maven (and other build tools) to correctly package the dependencies into the application jar, -these application dependencies must be specified in scope *compile* (unlike the core dependencies, which -must be specified in scope *provided*). - - -## Scala Versions - -Scala versions (2.10, 2.11, 2.12, etc.) are not binary compatible with one another. -For that reason, Flink for Scala 2.11 cannot be used with an application that uses -Scala 2.12. - -All Flink dependencies that (transitively) depend on Scala are suffixed with the -Scala version that they are built for, for example `flink-streaming-scala_2.11`. - -Developers that only use Java can pick any Scala version, Scala developers need to -pick the Scala version that matches their application's Scala version. - -Please refer to the [build guide]({{ site.baseurl }}/flinkDev/building.html#scala-versions) -for details on how to build Flink for a specific Scala version. - -## Hadoop Dependencies - -**General rule: It should never be necessary to add Hadoop dependencies directly to your application.** -*(The only exception being when using existing Hadoop input-/output formats with Flink's Hadoop compatibility wrappers)* - -If you want to use Flink with Hadoop, you need to have a Flink setup that includes the Hadoop dependencies, rather than -adding Hadoop as an application dependency. Please refer to the [Hadoop Setup Guide]({{ site.baseurl }}/ops/deployment/hadoop.html) -for details. - -There are two main reasons for that design: - - - Some Hadoop interaction happens in Flink's core, possibly before the user application is started, for example - setting up HDFS for checkpoints, authenticating via Hadoop's Kerberos tokens, or deployment on YARN. - - - Flink's inverted classloading approach hides many transitive dependencies from the core dependencies. That applies not only - to Flink's own core dependencies, but also to Hadoop's dependencies when present in the setup. - That way, applications can use different versions of the same dependencies without running into dependency conflicts (and - trust us, that's a big deal, because Hadoops dependency tree is huge.) - -If you need Hadoop dependencies during testing or development inside the IDE (for example for HDFS access), please configure -these dependencies similar to the scope of the dependencies to *test* or to *provided*. - - -## Appendix: Template for building a Jar with Dependencies - -To build an application JAR that contains all dependencies required for declared connectors and libraries, -you can use the following shade plugin definition: - -{% highlight xml %} -<build> - <plugins> - <plugin> - <groupId>org.apache.maven.plugins</groupId> - <artifactId>maven-shade-plugin</artifactId> - <version>3.1.1</version> - <executions> - <execution> - <phase>package</phase> - <goals> - <goal>shade</goal> - </goals> - <configuration> - <artifactSet> - <excludes> - <exclude>com.google.code.findbugs:jsr305</exclude> - <exclude>org.slf4j:*</exclude> - <exclude>log4j:*</exclude> - </excludes> - </artifactSet> - <filters> - <filter> - <!-- Do not copy the signatures in the META-INF folder. - Otherwise, this might cause SecurityExceptions when using the JAR. --> - <artifact>*:*</artifact> - <excludes> - <exclude>META-INF/*.SF</exclude> - <exclude>META-INF/*.DSA</exclude> - <exclude>META-INF/*.RSA</exclude> - </excludes> - </filter> - </filters> - <transformers> - <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> - <mainClass>my.programs.main.clazz</mainClass> - </transformer> - </transformers> - </configuration> - </execution> - </executions> - </plugin> - </plugins> -</build> -{% endhighlight %} - -{% top %} - diff --git a/docs/getting-started/project-setup/dependencies.zh.md b/docs/getting-started/project-setup/dependencies.zh.md deleted file mode 100644 index 310000f..0000000 --- a/docs/getting-started/project-setup/dependencies.zh.md +++ /dev/null @@ -1,200 +0,0 @@ ---- -title: "配置依赖、连接器、类库" -nav-parent_id: project-setup -nav-pos: 2 ---- -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -每个 Flink 应用都需要依赖一组 Flink 类库。Flink 应用至少需要依赖 Flink APIs。许多应用还会额外依赖连接器类库(比如 Kafka、Cassandra 等)。 -当用户运行 Flink 应用时(无论是在 IDE 环境下进行测试,还是部署在分布式环境下),运行时类库都必须可用。 - -## Flink 核心依赖以及应用依赖 - -与其他运行用户自定义应用的大多数系统一样,Flink 中有两大类依赖类库 - - - **Flink 核心依赖**:Flink 本身包含运行所需的一组类和依赖,比如协调、网络通讯、checkpoint、容错处理、API、算子(如窗口操作)、 - 资源管理等,这些类和依赖形成了 Flink 运行时的核心。当 Flink 应用启动时,这些依赖必须可用。 - - 这些核心类和依赖被打包在 `flink-dist` jar 里。它们是 Flink `lib` 文件夹下的一部分,也是 Flink 基本容器镜像的一部分。 - 这些依赖类似 Java `String` 和 `List` 的核心类库(`rt.jar`, `charsets.jar`等)。 - - Flink 核心依赖不包含连接器和类库(如 CEP、SQL、ML 等),这样做的目的是默认情况下避免在类路径中具有过多的依赖项和类。 - 实际上,我们希望尽可能保持核心依赖足够精简,以保证一个较小的默认类路径,并且避免依赖冲突。 - - - **用户应用依赖** 是指特定的应用程序需要的类库,如连接器,formats等。 - - 用户应用代码和所需的连接器以及其他类库依赖通常被打包到 *application jar* 中。 - - 用户应用程序依赖项不需包括 Flink DataSet / DataStream API 以及运行时依赖项,因为它们已经是 Flink 核心依赖项的一部分。 - -## 搭建一个项目: 基础依赖 - -开发 Flink 应用程序需要最低限度的 API 依赖。Maven 用户,可以使用 -[Java 项目模板]({{ site.baseurl }}/zh/dev/projectsetup/java_api_quickstart.html)或者 -[Scala 项目模板]({{ site.baseurl }}/zh/dev/projectsetup/scala_api_quickstart.html)来创建一个包含最初依赖的程序骨架。 - -手动设置项目时,需要为 Java 或 Scala API 添加以下依赖项(这里以 Maven 语法为例,但也适用于其他构建工具(Gradle、 SBT 等))。 - -<div class="codetabs" markdown="1"> -<div data-lang="java" markdown="1"> -{% highlight xml %} -<dependency> - <groupId>org.apache.flink</groupId> - <artifactId>flink-java</artifactId> - <version>{{site.version }}</version> - <scope>provided</scope> -</dependency> -<dependency> - <groupId>org.apache.flink</groupId> - <artifactId>flink-streaming-java{{ site.scala_version_suffix }}</artifactId> - <version>{{site.version }}</version> - <scope>provided</scope> -</dependency> -{% endhighlight %} -</div> -<div data-lang="scala" markdown="1"> -{% highlight xml %} -<dependency> - <groupId>org.apache.flink</groupId> - <artifactId>flink-scala{{ site.scala_version_suffix }}</artifactId> - <version>{{site.version }}</version> - <scope>provided</scope> -</dependency> -<dependency> - <groupId>org.apache.flink</groupId> - <artifactId>flink-streaming-scala{{ site.scala_version_suffix }}</artifactId> - <version>{{site.version }}</version> - <scope>provided</scope> -</dependency> -{% endhighlight %} -</div> -</div> - -**注意事项:** 所有这些依赖项的作用域都应该设置为 *provided* 。 -这意味着需要这些依赖进行编译,但不应将它们打包到项目生成的应用程序jar文件中-- -因为这些依赖项是 Flink 的核心依赖,在应用启动前已经是可用的状态了。 - -我们强烈建议保持这些依赖的作用域为 *provided* 。 如果它们的作用域未设置为 *provided* ,则典型的情况是因为包含了 Flink 的核心依赖而导致生成的jar包变得过大。 -最糟糕的情况是添加到应用程序的 Flink 核心依赖项与你自己的一些依赖项版本冲突(通常通过反向类加载来避免)。 - -**IntelliJ 上的一些注意事项:** 为了可以让 Flink 应用在 IntelliJ IDEA 中运行,需要在 Run 配置界面将 `Include dependencies with "Provided" scope` 选项勾选上。 -如果这一选项还不可用(可能是因为你正在使用一个老版本的 IntelliJ IDEA),那么一个简单的解决方案是创建一个测试,并在测试中调用应用程序的 `main()` 方法。 - -## 添加连接器以及类库依赖 - -大多数应用需要依赖特定的连接器或其他类库,例如 Kafka、Cassandra 的连接器等。这些连接器不是 Flink 核心依赖的一部分,因此必须作为依赖项手动添加到应用程序中。 - -下面是添加 Kafka 0.10 连接器依赖(Maven 语法)的示例: -{% highlight xml %} -<dependency> - <groupId>org.apache.flink</groupId> - <artifactId>flink-connector-kafka-0.10{{ site.scala_version_suffix }}</artifactId> - <version>{{site.version }}</version> -</dependency> -{% endhighlight %} - -我们建议将应用程序代码及其所有需要的依赖项打包到一个 *jar-with-dependencies* 的 jar 包中。 -这个打包好的应用 jar 可以提交到已经运行的 Flink 集群中,或者添加到 Flink 应用容器镜像中。 - -通过[Java 项目模板]({{ site.baseurl }}/zh/dev/projectsetup/java_api_quickstart.html) 或者 -[Scala 项目模板]({{ site.baseurl }}/zh/dev/projectsetup/scala_api_quickstart.html) 创建的应用, -当使用命令 `mvn clean package` 打包的时候会自动将应用依赖类库打包进应用 jar 包。 -对于不是通过上面模板创建的应用,我们推荐添加 Maven Shade Plugin 去构建应用。(下面的附录会给出具体配置) - -**注意:** 要使 Maven(以及其他构建工具)正确地将依赖项打包到应用程序 jar 中,必须将这些依赖项的作用域设置为 *compile* (与核心依赖项不同,后者作用域应该设置为 *provided* )。 - -## Scala 版本 - -Scala 版本(2.10、2.11、2.12等)互相是不兼容的。因此,依赖 Scala 2.11 的 Flink 环境是不可以运行依赖 Scala 2.12 应用的。 - -所有依赖 Scala 的 Flink 类库都以它们依赖的 Scala 版本为后缀,例如 `flink-streaming-scala_2.11`。 - -只使用 Java 的开发人员可以选择任何 Scala 版本,Scala 开发人员需要选择与其应用程序相匹配的 Scala 版本。 - -对于指定的 Scala 版本如何构建 Flink 应用可以参考 [构建指南]({{ site.baseurl }}/zh/flinkDev/building.html#scala-versions)。 - -## Hadoop 版本 - -**一般规则:永远不需要将 Hadoop 依赖项直接添加到你的应用程序中** -*(唯一的例外是使用 Flink 的 Hadoop 兼容包装器来处理 Hadoop 格式的输入/输出时)* - -如果你想要在 Flink 应用中使用 Hadoop,你需要使用包含 Hadoop 依赖的 Flink,而非将 Hadoop 作为应用依赖进行添加。 -请参考[Hadoop 构建指南]({{ site.baseurl }}/zh/ops/deployment/hadoop.html) - -这样设计是出于两个主要原因: - - - 可能在用户程序启动之前,一些 Hadoop 交互操作就已经发生在 Flink 核心中了,比如为 checkpoint 设置 HDFS 路径,通过 Hadoop's Kerberos tokens 进行权限认证以及进行 YARN 部署等。 - - - Flink 的反向类加载方法隐藏了核心依赖关系中的许多传递依赖关系。这不仅适用于 Flink 自己的核心依赖项,也适用于 Hadoop 在启动中存在的依赖项。 - 通过这种方式,应用程序可以使用相同依赖项的不同版本,而不会引起依赖项的冲突(若非如此可能会引起严重依赖问题,因为 Hadoops 依赖树十分庞大。) - -如果在 IDE 内部进行开发或测试的过程中需要 Hadoop 依赖项(例如用于 HDFS 访问),请将这些依赖项的作用域设置为 *test* 或 *provided* 。 - -## 附录:构建带有依赖的应用 jar 包模板 - -可以通过下面的 shade plugin 配置来构建包含所有依赖项的应用 jar 包 - -{% highlight xml %} -<build> - <plugins> - <plugin> - <groupId>org.apache.maven.plugins</groupId> - <artifactId>maven-shade-plugin</artifactId> - <version>3.1.1</version> - <executions> - <execution> - <phase>package</phase> - <goals> - <goal>shade</goal> - </goals> - <configuration> - <artifactSet> - <excludes> - <exclude>com.google.code.findbugs:jsr305</exclude> - <exclude>org.slf4j:*</exclude> - <exclude>log4j:*</exclude> - </excludes> - </artifactSet> - <filters> - <filter> - <!--不要拷贝 META-INF 目录下的签名, - 否则会引起 SecurityExceptions 。 --> - <artifact>*:*</artifact> - <excludes> - <exclude>META-INF/*.SF</exclude> - <exclude>META-INF/*.DSA</exclude> - <exclude>META-INF/*.RSA</exclude> - </excludes> - </filter> - </filters> - <transformers> - <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> - <mainClass>my.programs.main.clazz</mainClass> - </transformer> - </transformers> - </configuration> - </execution> - </executions> - </plugin> - </plugins> -</build> -{% endhighlight %} - -{% top %} diff --git a/docs/getting-started/project-setup/java_api_quickstart.md b/docs/getting-started/project-setup/java_api_quickstart.md deleted file mode 100644 index 07fc8ef..0000000 --- a/docs/getting-started/project-setup/java_api_quickstart.md +++ /dev/null @@ -1,375 +0,0 @@ ---- -title: "Project Template for Java" -nav-title: Project Template for Java -nav-parent_id: project-setup -nav-pos: 0 ---- -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -* This will be replaced by the TOC -{:toc} - - -## Build Tools - -Flink projects can be built with different build tools. -In order to get started quickly, Flink provides project templates for the following build tools: - -- [Maven](#maven) -- [Gradle](#gradle) - -These templates help you to set up the project structure and to create the initial build files. - -## Maven - -### Requirements - -The only requirements are working __Maven 3.0.4__ (or higher) and __Java 8.x__ installations. - -### Create Project - -Use one of the following commands to __create a project__: - -<ul class="nav nav-tabs" style="border-bottom: none;"> - <li class="active"><a href="#maven-archetype" data-toggle="tab">Use <strong>Maven archetypes</strong></a></li> - <li><a href="#quickstart-script" data-toggle="tab">Run the <strong>quickstart script</strong></a></li> -</ul> -<div class="tab-content"> - <div class="tab-pane active" id="maven-archetype"> -{% highlight bash %} -$ mvn archetype:generate \ - -DarchetypeGroupId=org.apache.flink \ - -DarchetypeArtifactId=flink-quickstart-java \{% unless site.is_stable %} - -DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/ \{% endunless %} - -DarchetypeVersion={{site.version}} -{% endhighlight %} - This allows you to <strong>name your newly created project</strong>. It will interactively ask you for the groupId, artifactId, and package name. - </div> - <div class="tab-pane" id="quickstart-script"> -{% highlight bash %} -{% if site.is_stable %} -$ curl https://flink.apache.org/q/quickstart.sh | bash -s {{site.version}} -{% else %} -$ curl https://flink.apache.org/q/quickstart-SNAPSHOT.sh | bash -s {{site.version}} -{% endif %} -{% endhighlight %} - - </div> - {% unless site.is_stable %} - <p style="border-radius: 5px; padding: 5px" class="bg-danger"> - <b>Note</b>: For Maven 3.0 or higher, it is no longer possible to specify the repository (-DarchetypeCatalog) via the command line. For details about this change, please refer to <a href="http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html">Maven official document</a> - If you wish to use the snapshot repository, you need to add a repository entry to your settings.xml. For example: -{% highlight bash %} -<settings> - <activeProfiles> - <activeProfile>apache</activeProfile> - </activeProfiles> - <profiles> - <profile> - <id>apache</id> - <repositories> - <repository> - <id>apache-snapshots</id> - <url>https://repository.apache.org/content/repositories/snapshots/</url> - </repository> - </repositories> - </profile> - </profiles> -</settings> -{% endhighlight %} - </p> - {% endunless %} -</div> - -### Inspect Project - -There will be a new directory in your working directory. If you've used -the _curl_ approach, the directory is called `quickstart`. Otherwise, -it has the name of your `artifactId`: - -{% highlight bash %} -$ tree quickstart/ -quickstart/ -├── pom.xml -└── src - └── main - ├── java - │ └── org - │ └── myorg - │ └── quickstart - │ ├── BatchJob.java - │ └── StreamingJob.java - └── resources - └── log4j2.properties -{% endhighlight %} - -The sample project is a __Maven project__, which contains two classes: _StreamingJob_ and _BatchJob_ are the basic skeleton programs for a *DataStream* and *DataSet* program. -The _main_ method is the entry point of the program, both for in-IDE testing/execution and for proper deployments. - -We recommend you __import this project into your IDE__ to develop and -test it. IntelliJ IDEA supports Maven projects out of the box. -If you use Eclipse, the [m2e plugin](http://www.eclipse.org/m2e/) -allows to [import Maven projects](http://books.sonatype.com/m2eclipse-book/reference/creating-sect-importing-projects.html#fig-creating-import). -Some Eclipse bundles include that plugin by default, others require you -to install it manually. - -*Please note*: The default JVM heapsize for Java may be too -small for Flink. You have to manually increase it. -In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM Arguments` box: `-Xmx800m`. -In IntelliJ IDEA recommended way to change JVM options is from the `Help | Edit Custom VM Options` menu. See [this article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties) for details. - -### Build Project - -If you want to __build/package your project__, go to your project directory and -run the '`mvn clean package`' command. -You will __find a JAR file__ that contains your application, plus connectors and libraries -that you may have added as dependencies to the application: `target/<artifact-id>-<version>.jar`. - -__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point, -we recommend you change the `mainClass` setting in the `pom.xml` file accordingly. That way, Flink -can run the application from the JAR file without additionally specifying the main class. - -## Gradle - -### Requirements - -The only requirements are working __Gradle 3.x__ (or higher) and __Java 8.x__ installations. - -### Create Project - -Use one of the following commands to __create a project__: - -<ul class="nav nav-tabs" style="border-bottom: none;"> - <li class="active"><a href="#gradle-example" data-toggle="tab"><strong>Gradle example</strong></a></li> - <li><a href="#gradle-script" data-toggle="tab">Run the <strong>quickstart script</strong></a></li> -</ul> -<div class="tab-content"> - <div class="tab-pane active" id="gradle-example"> - - <ul class="nav nav-tabs" style="border-bottom: none;"> - <li class="active"><a href="#gradle-build" data-toggle="tab"><tt>build.gradle</tt></a></li> - <li><a href="#gradle-settings" data-toggle="tab"><tt>settings.gradle</tt></a></li> - </ul> - <div class="tab-content"> -<!-- NOTE: Any change to the build scripts here should also be reflected in flink-web/q/gradle-quickstart.sh !! --> - <div class="tab-pane active" id="gradle-build"> -{% highlight gradle %} -buildscript { - repositories { - jcenter() // this applies only to the Gradle 'Shadow' plugin - } - dependencies { - classpath 'com.github.jengelman.gradle.plugins:shadow:2.0.4' - } -} - -plugins { - id 'java' - id 'application' - // shadow plugin to produce fat JARs - id 'com.github.johnrengelman.shadow' version '2.0.4' -} - - -// artifact properties -group = 'org.myorg.quickstart' -version = '0.1-SNAPSHOT' -mainClassName = 'org.myorg.quickstart.StreamingJob' -description = """Flink Quickstart Job""" - -ext { - javaVersion = '1.8' - flinkVersion = '{{ site.version }}' - scalaBinaryVersion = '{{ site.scala_version }}' - slf4jVersion = '1.7.15' - log4jVersion = '2.12.1' -} - - -sourceCompatibility = javaVersion -targetCompatibility = javaVersion -tasks.withType(JavaCompile) { - options.encoding = 'UTF-8' -} - -applicationDefaultJvmArgs = ["-Dlog4j.configurationFile=log4j2.properties"] - -task wrapper(type: Wrapper) { - gradleVersion = '3.1' -} - -// declare where to find the dependencies of your project -repositories { - mavenCentral() - maven { url "https://repository.apache.org/content/repositories/snapshots/" } -} - -// NOTE: We cannot use "compileOnly" or "shadow" configurations since then we could not run code -// in the IDE or with "gradle run". We also cannot exclude transitive dependencies from the -// shadowJar yet (see https://github.com/johnrengelman/shadow/issues/159). -// -> Explicitly define the // libraries we want to be included in the "flinkShadowJar" configuration! -configurations { - flinkShadowJar // dependencies which go into the shadowJar - - // always exclude these (also from transitive dependencies) since they are provided by Flink - flinkShadowJar.exclude group: 'org.apache.flink', module: 'force-shading' - flinkShadowJar.exclude group: 'com.google.code.findbugs', module: 'jsr305' - flinkShadowJar.exclude group: 'org.slf4j' - flinkShadowJar.exclude group: 'org.apache.logging.log4j' -} - -// declare the dependencies for your production and test code -dependencies { - // -------------------------------------------------------------- - // Compile-time dependencies that should NOT be part of the - // shadow jar and are provided in the lib folder of Flink - // -------------------------------------------------------------- - compile "org.apache.flink:flink-java:${flinkVersion}" - compile "org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}" - - // -------------------------------------------------------------- - // Dependencies that should be part of the shadow jar, e.g. - // connectors. These must be in the flinkShadowJar configuration! - // -------------------------------------------------------------- - //flinkShadowJar "org.apache.flink:flink-connector-kafka-0.11_${scalaBinaryVersion}:${flinkVersion}" - - compile "org.apache.logging.log4j:log4j-api:${log4jVersion}" - compile "org.apache.logging.log4j:log4j-core:${log4jVersion}" - compile "org.apache.logging.log4j:log4j-slf4j-impl:${log4jVersion}" - compile "org.slf4j:slf4j-log4j12:${slf4jVersion}" - - // Add test dependencies here. - // testCompile "junit:junit:4.12" -} - -// make compileOnly dependencies available for tests: -sourceSets { - main.compileClasspath += configurations.flinkShadowJar - main.runtimeClasspath += configurations.flinkShadowJar - - test.compileClasspath += configurations.flinkShadowJar - test.runtimeClasspath += configurations.flinkShadowJar - - javadoc.classpath += configurations.flinkShadowJar -} - -run.classpath = sourceSets.main.runtimeClasspath - -jar { - manifest { - attributes 'Built-By': System.getProperty('user.name'), - 'Build-Jdk': System.getProperty('java.version') - } -} - -shadowJar { - configurations = [project.configurations.flinkShadowJar] -} -{% endhighlight %} - </div> - <div class="tab-pane" id="gradle-settings"> -{% highlight gradle %} -rootProject.name = 'quickstart' -{% endhighlight %} - </div> - </div> - </div> - - <div class="tab-pane" id="gradle-script"> -{% highlight bash %} -bash -c "$(curl https://flink.apache.org/q/gradle-quickstart.sh)" -- {{site.version}} {{site.scala_version}} -{% endhighlight %} - This allows you to <strong>name your newly created project</strong>. It will interactively ask - you for the project name, organization (also used for the package name), project version, - Scala and Flink version. - </div> -</div> - -### Inspect Project - -There will be a new directory in your working directory based on the -project name you provided, e.g. for `quickstart`: - -{% highlight bash %} -$ tree quickstart/ -quickstart/ -├── README -├── build.gradle -├── settings.gradle -└── src - └── main - ├── java - │ └── org - │ └── myorg - │ └── quickstart - │ ├── BatchJob.java - │ └── StreamingJob.java - └── resources - └── log4j2.properties -{% endhighlight %} - -The sample project is a __Gradle project__, which contains two classes: _StreamingJob_ and _BatchJob_ are the basic skeleton programs for a *DataStream* and *DataSet* program. -The _main_ method is the entry point of the program, both for in-IDE testing/execution and for proper deployments. - -We recommend you __import this project into your IDE__ to develop and -test it. IntelliJ IDEA supports Gradle projects after installing the `Gradle` plugin. -Eclipse does so via the [Eclipse Buildship](https://projects.eclipse.org/projects/tools.buildship) plugin -(make sure to specify a Gradle version >= 3.0 in the last step of the import wizard; the `shadow` plugin requires it). -You may also use [Gradle's IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) -to create project files from Gradle. - - -*Please note*: The default JVM heapsize for Java may be too -small for Flink. You have to manually increase it. -In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM Arguments` box: `-Xmx800m`. -In IntelliJ IDEA recommended way to change JVM options is from the `Help | Edit Custom VM Options` menu. See [this article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties) for details. - -### Build Project - -If you want to __build/package your project__, go to your project directory and -run the '`gradle clean shadowJar`' command. -You will __find a JAR file__ that contains your application, plus connectors and libraries -that you may have added as dependencies to the application: `build/libs/<project-name>-<version>-all.jar`. - -__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point, -we recommend you change the `mainClassName` setting in the `build.gradle` file accordingly. That way, Flink -can run the application from the JAR file without additionally specifying the main class. - -## Next Steps - -Write your application! - -If you are writing a streaming application and you are looking for inspiration what to write, -take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/getting-started/walkthroughs/datastream_api.html). - -If you are writing a batch processing application and you are looking for inspiration what to write, -take a look at the [Batch Application Examples]({{ site.baseurl }}/dev/batch/examples.html). - -For a complete overview over the APIs, have a look at the -[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and -[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections. - -[Here]({{ site.baseurl }}/ops/deployment/local.html) you can find out how to run an application outside the IDE on a local cluster. - -If you have any trouble, ask on our -[Mailing List](http://mail-archives.apache.org/mod_mbox/flink-user/). -We are happy to provide help. - -{% top %} diff --git a/docs/getting-started/project-setup/java_api_quickstart.zh.md b/docs/getting-started/project-setup/java_api_quickstart.zh.md deleted file mode 100644 index a5a0493..0000000 --- a/docs/getting-started/project-setup/java_api_quickstart.zh.md +++ /dev/null @@ -1,360 +0,0 @@ ---- -title: "Java 项目模板" -nav-title: Java 项目模板 -nav-parent_id: project-setup -nav-pos: 0 ---- -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -* This will be replaced by the TOC -{:toc} - - -## 构建工具 - -Flink项目可以使用不同的构建工具进行构建。 -为了能够快速入门,Flink 为以下构建工具提供了项目模版: - -- [Maven](#maven) -- [Gradle](#gradle) - -这些模版可以帮助你搭建项目结构并创建初始构建文件。 - -## Maven - -### 环境要求 - -唯一的要求是使用 __Maven 3.0.4__ (或更高版本)和安装 __Java 8.x__。 - -### 创建项目 - -使用以下命令之一来 __创建项目__: - -<ul class="nav nav-tabs" style="border-bottom: none;"> - <li class="active"><a href="#maven-archetype" data-toggle="tab">使用 <strong>Maven archetypes</strong></a></li> - <li><a href="#quickstart-script" data-toggle="tab">运行 <strong>quickstart 脚本</strong></a></li> -</ul> -<div class="tab-content"> - <div class="tab-pane active" id="maven-archetype"> -{% highlight bash %} -$ mvn archetype:generate \ - -DarchetypeGroupId=org.apache.flink \ - -DarchetypeArtifactId=flink-quickstart-java \{% unless site.is_stable %} - -DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/ \{% endunless %} - -DarchetypeVersion={{site.version}} -{% endhighlight %} - 这种方式允许你<strong>为新项目命名</strong>。它将以交互式的方式询问你项目的 groupId、artifactId 和 package 名称。 - </div> - <div class="tab-pane" id="quickstart-script"> -{% highlight bash %} -{% if site.is_stable %} -$ curl https://flink.apache.org/q/quickstart.sh | bash -s {{site.version}} -{% else %} -$ curl https://flink.apache.org/q/quickstart-SNAPSHOT.sh | bash -s {{site.version}} -{% endif %} -{% endhighlight %} - - </div> - {% unless site.is_stable %} - <p style="border-radius: 5px; padding: 5px" class="bg-danger"> - <b>注意</b>:Maven 3.0 及更高版本,不再支持通过命令行指定仓库(-DarchetypeCatalog)。有关这个改动的详细信息, - 如果你希望使用快照仓库,则需要在 settings.xml 文件中添加一个仓库条目。有关这个改动的详细信息, - 请参阅 <a href="http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html">Maven 官方文档</a> 请参阅 <a href="http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html">Maven 官方文档</a> - 如果你希望使用快照仓库,则需要在 settings.xml 文件中添加一个仓库条目。例如: -{% highlight bash %} -<settings> - <activeProfiles> - <activeProfile>apache</activeProfile> - </activeProfiles> - <profiles> - <profile> - <id>apache</id> - <repositories> - <repository> - <id>apache-snapshots</id> - <url>https://repository.apache.org/content/repositories/snapshots/</url> - </repository> - </repositories> - </profile> - </profiles> -</settings> -{% endhighlight %} - </p> - {% endunless %} -</div> - -### 检查项目 - -项目创建后,工作目录将多出一个新目录。如果你使用的是 _curl_ 方式创建项目,目录名为 `quickstart`; -如果你使用的是 _Maven archetypes_ 方式创建项目,则目录名为你指定的 `artifactId`: - -{% highlight bash %} -$ tree quickstart/ -quickstart/ -├── pom.xml -└── src - └── main - ├── java - │ └── org - │ └── myorg - │ └── quickstart - │ ├── BatchJob.java - │ └── StreamingJob.java - └── resources - └── log4j2.properties -{% endhighlight %} - -示例项目是一个 __Maven project__,它包含了两个类:_StreamingJob_ 和 _BatchJob_ 分别是 *DataStream* and *DataSet* 程序的基础骨架程序。 -_main_ 方法是程序的入口,既可用于IDE测试/执行,也可用于部署。 - -我们建议你将 __此项目导入IDE__ 来开发和测试它。 -IntelliJ IDEA 支持 Maven 项目开箱即用。如果你使用的是 Eclipse,使用[m2e 插件](http://www.eclipse.org/m2e/) 可以 -[导入 Maven 项目](http://books.sonatype.com/m2eclipse-book/reference/creating-sect-importing-projects.html#fig-creating-import)。 -一些 Eclipse 捆绑包默认包含该插件,其他情况需要你手动安装。 - -*请注意*:对 Flink 来说,默认的 JVM 堆内存可能太小,你应当手动增加堆内存。 -在 Eclipse 中,选择 `Run Configurations -> Arguments` 并在 `VM Arguments` 对应的输入框中写入:`-Xmx800m`。 -在 IntelliJ IDEA 中,推荐从菜单 `Help | Edit Custom VM Options` 来修改 JVM 选项。有关详细信息,请参阅[这篇文章](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties)。 - -### 构建项目 - -如果你想要 __构建/打包你的项目__,请在项目目录下运行 '`mvn clean package`' 命令。 -命令执行后,你将 __找到一个JAR文件__,里面包含了你的应用程序,以及已作为依赖项添加到应用程序的连接器和库:`target/<artifact-id>-<version>.jar`。 - -__注意:__ 如果你使用其他类而不是 *StreamingJob* 作为应用程序的主类/入口, -我们建议你相应地修改 `pom.xml` 文件中的 `mainClass` 配置。这样, -Flink 可以从 JAR 文件运行应用程序,而无需另外指定主类。 - -## Gradle - -### 环境要求 - -唯一的要求是使用 __Gradle 3.x__ (或更高版本) 和安装 __Java 8.x__ 。 - -### 创建项目 - -使用以下命令之一来 __创建项目__: - -<ul class="nav nav-tabs" style="border-bottom: none;"> - <li class="active"><a href="#gradle-example" data-toggle="tab"><strong>Gradle 示例</strong></a></li> - <li><a href="#gradle-script" data-toggle="tab">运行 <strong>quickstart 脚本</strong></a></li> -</ul> -<div class="tab-content"> - <div class="tab-pane active" id="gradle-example"> - - <ul class="nav nav-tabs" style="border-bottom: none;"> - <li class="active"><a href="#gradle-build" data-toggle="tab"><tt>build.gradle</tt></a></li> - <li><a href="#gradle-settings" data-toggle="tab"><tt>settings.gradle</tt></a></li> - </ul> - <div class="tab-content"> -<!-- NOTE: Any change to the build scripts here should also be reflected in flink-web/q/gradle-quickstart.sh !! --> - <div class="tab-pane active" id="gradle-build"> -{% highlight gradle %} -buildscript { - repositories { - jcenter() // this applies only to the Gradle 'Shadow' plugin - } - dependencies { - classpath 'com.github.jengelman.gradle.plugins:shadow:2.0.4' - } -} - -plugins { - id 'java' - id 'application' - // shadow plugin to produce fat JARs - id 'com.github.johnrengelman.shadow' version '2.0.4' -} - - -// artifact properties -group = 'org.myorg.quickstart' -version = '0.1-SNAPSHOT' -mainClassName = 'org.myorg.quickstart.StreamingJob' -description = """Flink Quickstart Job""" - -ext { - javaVersion = '1.8' - flinkVersion = '{{ site.version }}' - scalaBinaryVersion = '{{ site.scala_version }}' - slf4jVersion = '1.7.15' - log4jVersion = '2.12.1' -} - - -sourceCompatibility = javaVersion -targetCompatibility = javaVersion -tasks.withType(JavaCompile) { - options.encoding = 'UTF-8' -} - -applicationDefaultJvmArgs = ["-Dlog4j.configurationFile=log4j2.properties"] - -task wrapper(type: Wrapper) { - gradleVersion = '3.1' -} - -// declare where to find the dependencies of your project -repositories { - mavenCentral() - maven { url "https://repository.apache.org/content/repositories/snapshots/" } -} - -// 注意:我们不能使用 "compileOnly" 或者 "shadow" 配置,这会使我们无法在 IDE 中或通过使用 "gradle run" 命令运行代码。 -// 我们也不能从 shadowJar 中排除传递依赖(请查看 https://github.com/johnrengelman/shadow/issues/159)。 -// -> 显式定义我们想要包含在 "flinkShadowJar" 配置中的类库! -configurations { - flinkShadowJar // dependencies which go into the shadowJar - - // 总是排除这些依赖(也来自传递依赖),因为 Flink 会提供这些依赖。 - flinkShadowJar.exclude group: 'org.apache.flink', module: 'force-shading' - flinkShadowJar.exclude group: 'com.google.code.findbugs', module: 'jsr305' - flinkShadowJar.exclude group: 'org.slf4j' - flinkShadowJar.exclude group: 'org.apache.logging.log4j' -} - -// declare the dependencies for your production and test code -dependencies { - // -------------------------------------------------------------- - // 编译时依赖不应该包含在 shadow jar 中, - // 这些依赖会在 Flink 的 lib 目录中提供。 - // -------------------------------------------------------------- - compile "org.apache.flink:flink-java:${flinkVersion}" - compile "org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}" - - // -------------------------------------------------------------- - // 应该包含在 shadow jar 中的依赖,例如:连接器。 - // 它们必须在 flinkShadowJar 的配置中! - // -------------------------------------------------------------- - //flinkShadowJar "org.apache.flink:flink-connector-kafka-0.11_${scalaBinaryVersion}:${flinkVersion}" - - compile "org.apache.logging.log4j:log4j-api:${log4jVersion}" - compile "org.apache.logging.log4j:log4j-core:${log4jVersion}" - compile "org.apache.logging.log4j:log4j-slf4j-impl:${log4jVersion}" - compile "org.slf4j:slf4j-log4j12:${slf4jVersion}" - - // Add test dependencies here. - // testCompile "junit:junit:4.12" -} - -// make compileOnly dependencies available for tests: -sourceSets { - main.compileClasspath += configurations.flinkShadowJar - main.runtimeClasspath += configurations.flinkShadowJar - - test.compileClasspath += configurations.flinkShadowJar - test.runtimeClasspath += configurations.flinkShadowJar - - javadoc.classpath += configurations.flinkShadowJar -} - -run.classpath = sourceSets.main.runtimeClasspath - -jar { - manifest { - attributes 'Built-By': System.getProperty('user.name'), - 'Build-Jdk': System.getProperty('java.version') - } -} - -shadowJar { - configurations = [project.configurations.flinkShadowJar] -} -{% endhighlight %} - </div> - <div class="tab-pane" id="gradle-settings"> -{% highlight gradle %} -rootProject.name = 'quickstart' -{% endhighlight %} - </div> - </div> - </div> - - <div class="tab-pane" id="gradle-script"> -{% highlight bash %} -bash -c "$(curl https://flink.apache.org/q/gradle-quickstart.sh)" -- {{site.version}} {{site.scala_version}} -{% endhighlight %} - 这种方式允许你<strong>为新项目命名</strong>。它将以交互式的方式询问你的项目名称、组织机构(也用于包名)、项目版本、Scala 和 Flink 版本。 - </div> -</div> - -### 检查项目 - -根据你提供的项目名称,工作目录中将多出一个新目录,例如 `quickstart`: - -{% highlight bash %} -$ tree quickstart/ -quickstart/ -├── README -├── build.gradle -├── settings.gradle -└── src - └── main - ├── java - │ └── org - │ └── myorg - │ └── quickstart - │ ├── BatchJob.java - │ └── StreamingJob.java - └── resources - └── log4j2.properties -{% endhighlight %} - -示例项目是一个 __Gradle 项目__,它包含了两个类:_StreamingJob_ 和 _BatchJob_ 是 *DataStream* 和 *DataSet* 程序的基础骨架程序。 -_main_ 方法是程序的入口,即可用于IDE测试/执行,也可用于部署。 - -我们建议你将 __此项目导入你的 IDE__ 来开发和测试它。 -IntelliJ IDEA 在安装 `Gradle` 插件后支持 Gradle 项目。Eclipse 则通过 [Eclipse Buildship](https://projects.eclipse.org/projects/tools.buildship) 插件支持 Gradle 项目(鉴于 `shadow` 插件对 Gradle 版本有要求,请确保在导入向导的最后一步指定 Gradle 版本 >= 3.0)。 -你也可以使用 [Gradle’s IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) 从 Gradle -创建项目文件。 - - -*请注意*:对 Flink 来说,默认的 JVM 堆内存可能太小,你应当手动增加堆内存。 -在 Eclipse中,选择 `Run Configurations -> Arguments` 并在 `VM Arguments` 对应的输入框中写入:`-Xmx800m`。 -在 IntelliJ IDEA 中,推荐从菜单 `Help | Edit Custom VM Options` 来修改 JVM 选项。有关详细信息,请参阅[此文章](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties)。 - -### 构建项目 - -如果你想要 __构建/打包项目__,请在项目目录下运行 '`gradle clean shadowJar`' 命令。 -命令执行后,你将 __找到一个 JAR 文件__,里面包含了你的应用程序,以及已作为依赖项添加到应用程序的连接器和库:`build/libs/<project-name>-<version>-all.jar`。 - -__注意:__ 如果你使用其他类而不是 *StreamingJob* 作为应用程序的主类/入口, -我们建议你相应地修改 `build.gradle` 文件中的 `mainClassName` 配置。 -这样,Flink 可以从 JAR 文件运行应用程序,而无需另外指定主类。 - -## 下一步 - -开始编写应用! - -如果你准备编写流处理应用,正在寻找灵感来写什么, -可以看看[流处理应用程序教程]({{ site.baseurl }}/zh/getting-started/walkthroughs/datastream_api.html) - -如果你准备编写批处理应用,正在寻找灵感来写什么, -可以看看[批处理应用程序示例]({{ site.baseurl }}/zh/dev/batch/examples.html) - -有关 API 的完整概述,请查看 -[DataStream API]({{ site.baseurl }}/zh/dev/datastream_api.html) 和 -[DataSet API]({{ site.baseurl }}/zh/dev/batch/index.html) 章节。 - -在[这里]({{ site.baseurl }}/zh/ops/deployment/local.html),你可以找到如何在 IDE 之外的本地集群中运行应用程序。 - -如果你有任何问题,请发信至我们的[邮箱列表](http://mail-archives.apache.org/mod_mbox/flink-user/),我们很乐意提供帮助。 - -{% top %} diff --git a/docs/getting-started/project-setup/scala_api_quickstart.md b/docs/getting-started/project-setup/scala_api_quickstart.md deleted file mode 100644 index 82e1368..0000000 --- a/docs/getting-started/project-setup/scala_api_quickstart.md +++ /dev/null @@ -1,249 +0,0 @@ ---- -title: "Project Template for Scala" -nav-title: Project Template for Scala -nav-parent_id: project-setup -nav-pos: 1 ---- -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -* This will be replaced by the TOC -{:toc} - - -## Build Tools - -Flink projects can be built with different build tools. -In order to get started quickly, Flink provides project templates for the following build tools: - -- [SBT](#sbt) -- [Maven](#maven) - -These templates help you to set up the project structure and to create the initial build files. - -## SBT - -### Create Project - -You can scaffold a new project via either of the following two methods: - -<ul class="nav nav-tabs" style="border-bottom: none;"> - <li class="active"><a href="#sbt_template" data-toggle="tab">Use the <strong>sbt template</strong></a></li> - <li><a href="#quickstart-script-sbt" data-toggle="tab">Run the <strong>quickstart script</strong></a></li> -</ul> - -<div class="tab-content"> - <div class="tab-pane active" id="sbt_template"> -{% highlight bash %} -$ sbt new tillrohrmann/flink-project.g8 -{% endhighlight %} - This will prompt you for a couple of parameters (project name, flink version...) and then create a Flink project from the <a href="https://github.com/tillrohrmann/flink-project.g8">flink-project template</a>. - You need sbt >= 0.13.13 to execute this command. You can follow this <a href="http://www.scala-sbt.org/download.html">installation guide</a> to obtain it if necessary. - </div> - <div class="tab-pane" id="quickstart-script-sbt"> -{% highlight bash %} -$ bash <(curl https://flink.apache.org/q/sbt-quickstart.sh) -{% endhighlight %} - This will create a Flink project in the <strong>specified</strong> project directory. - </div> -</div> - -### Build Project - -In order to build your project you simply have to issue the `sbt clean assembly` command. -This will create the fat-jar __your-project-name-assembly-0.1-SNAPSHOT.jar__ in the directory __target/scala_your-major-scala-version/__. - -### Run Project - -In order to run your project you have to issue the `sbt run` command. - -Per default, this will run your job in the same JVM as `sbt` is running. -In order to run your job in a distinct JVM, add the following line to `build.sbt` - -{% highlight scala %} -fork in run := true -{% endhighlight %} - - -#### IntelliJ - -We recommend using [IntelliJ](https://www.jetbrains.com/idea/) for your Flink job development. -In order to get started, you have to import your newly created project into IntelliJ. -You can do this via `File -> New -> Project from Existing Sources...` and then choosing your project's directory. -IntelliJ will then automatically detect the `build.sbt` file and set everything up. - -In order to run your Flink job, it is recommended to choose the `mainRunner` module as the classpath of your __Run/Debug Configuration__. -This will ensure, that all dependencies which are set to _provided_ will be available upon execution. -You can configure the __Run/Debug Configurations__ via `Run -> Edit Configurations...` and then choose `mainRunner` from the _Use classpath of module_ dropbox. - -#### Eclipse - -In order to import the newly created project into [Eclipse](https://eclipse.org/), you first have to create Eclipse project files for it. -These project files can be created via the [sbteclipse](https://github.com/typesafehub/sbteclipse) plugin. -Add the following line to your `PROJECT_DIR/project/plugins.sbt` file: - -{% highlight bash %} -addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "4.0.0") -{% endhighlight %} - -In `sbt` use the following command to create the Eclipse project files - -{% highlight bash %} -> eclipse -{% endhighlight %} - -Now you can import the project into Eclipse via `File -> Import... -> Existing Projects into Workspace` and then select the project directory. - -## Maven - -### Requirements - -The only requirements are working __Maven 3.0.4__ (or higher) and __Java 8.x__ installations. - - -### Create Project - -Use one of the following commands to __create a project__: - -<ul class="nav nav-tabs" style="border-bottom: none;"> - <li class="active"><a href="#maven-archetype" data-toggle="tab">Use <strong>Maven archetypes</strong></a></li> - <li><a href="#quickstart-script" data-toggle="tab">Run the <strong>quickstart script</strong></a></li> -</ul> - -<div class="tab-content"> - <div class="tab-pane active" id="maven-archetype"> -{% highlight bash %} -$ mvn archetype:generate \ - -DarchetypeGroupId=org.apache.flink \ - -DarchetypeArtifactId=flink-quickstart-scala \{% unless site.is_stable %} - -DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/ \{% endunless %} - -DarchetypeVersion={{site.version}} -{% endhighlight %} - This allows you to <strong>name your newly created project</strong>. It will interactively ask you for the groupId, artifactId, and package name. - </div> - <div class="tab-pane" id="quickstart-script"> -{% highlight bash %} -{% if site.is_stable %} -$ curl https://flink.apache.org/q/quickstart-scala.sh | bash -s {{site.version}} -{% else %} -$ curl https://flink.apache.org/q/quickstart-scala-SNAPSHOT.sh | bash -s {{site.version}} -{% endif %} -{% endhighlight %} - </div> - {% unless site.is_stable %} - <p style="border-radius: 5px; padding: 5px" class="bg-danger"> - <b>Note</b>: For Maven 3.0 or higher, it is no longer possible to specify the repository (-DarchetypeCatalog) via the command line. For details about this change, please refer to <a href="http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html">Maven official document</a> - If you wish to use the snapshot repository, you need to add a repository entry to your settings.xml. For example: -{% highlight bash %} -<settings> - <activeProfiles> - <activeProfile>apache</activeProfile> - </activeProfiles> - <profiles> - <profile> - <id>apache</id> - <repositories> - <repository> - <id>apache-snapshots</id> - <url>https://repository.apache.org/content/repositories/snapshots/</url> - </repository> - </repositories> - </profile> - </profiles> -</settings> -{% endhighlight %} - </p> - {% endunless %} -</div> - - -### Inspect Project - -There will be a new directory in your working directory. If you've used -the _curl_ approach, the directory is called `quickstart`. Otherwise, -it has the name of your `artifactId`: - -{% highlight bash %} -$ tree quickstart/ -quickstart/ -├── pom.xml -└── src - └── main - ├── resources - │ └── log4j2.properties - └── scala - └── org - └── myorg - └── quickstart - ├── BatchJob.scala - └── StreamingJob.scala -{% endhighlight %} - -The sample project is a __Maven project__, which contains two classes: _StreamingJob_ and _BatchJob_ are the basic skeleton programs for a *DataStream* and *DataSet* program. -The _main_ method is the entry point of the program, both for in-IDE testing/execution and for proper deployments. - -We recommend you __import this project into your IDE__. - -IntelliJ IDEA supports Maven out of the box and offers a plugin for Scala development. -From our experience, IntelliJ provides the best experience for developing Flink applications. - -For Eclipse, you need the following plugins, which you can install from the provided Eclipse Update Sites: - -* _Eclipse 4.x_ - * [Scala IDE](http://download.scala-ide.org/sdk/lithium/e44/scala211/stable/site) - * [m2eclipse-scala](http://alchim31.free.fr/m2e-scala/update-site) - * [Build Helper Maven Plugin](https://repo1.maven.org/maven2/.m2e/connectors/m2eclipse-buildhelper/0.15.0/N/0.15.0.201207090124/) -* _Eclipse 3.8_ - * [Scala IDE for Scala 2.11](http://download.scala-ide.org/sdk/helium/e38/scala211/stable/site) or [Scala IDE for Scala 2.10](http://download.scala-ide.org/sdk/helium/e38/scala210/stable/site) - * [m2eclipse-scala](http://alchim31.free.fr/m2e-scala/update-site) - * [Build Helper Maven Plugin](https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.14.0/N/0.14.0.201109282148/) - -### Build Project - -If you want to __build/package your project__, go to your project directory and -run the '`mvn clean package`' command. -You will __find a JAR file__ that contains your application, plus connectors and libraries -that you may have added as dependencies to the application: `target/<artifact-id>-<version>.jar`. - -__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point, -we recommend you change the `mainClass` setting in the `pom.xml` file accordingly. That way, the Flink -can run time application from the JAR file without additionally specifying the main class. - - -## Next Steps - -Write your application! - -If you are writing a streaming application and you are looking for inspiration what to write, -take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/getting-started/walkthroughs/datastream_api.html) - -If you are writing a batch processing application and you are looking for inspiration what to write, -take a look at the [Batch Application Examples]({{ site.baseurl }}/dev/batch/examples.html) - -For a complete overview over the APIs, have a look at the -[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and -[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections. - -[Here]({{ site.baseurl }}/ops/deployment/local.html) you can find out how to run an application outside the IDE on a local cluster. - -If you have any trouble, ask on our -[Mailing List](http://mail-archives.apache.org/mod_mbox/flink-user/). -We are happy to provide help. - -{% top %} diff --git a/docs/getting-started/project-setup/scala_api_quickstart.zh.md b/docs/getting-started/project-setup/scala_api_quickstart.zh.md deleted file mode 100644 index 8628d07..0000000 --- a/docs/getting-started/project-setup/scala_api_quickstart.zh.md +++ /dev/null @@ -1,241 +0,0 @@ ---- -title: "Scala 项目模板" -nav-title: Scala 项目模板 -nav-parent_id: project-setup -nav-pos: 1 ---- -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -* This will be replaced by the TOC -{:toc} - - -## 构建工具 - -可以使用不同的构建工具来构建Flink项目。 -为了快速入门,Flink为以下构建工具提供了项目模板: - -- [SBT](#sbt) -- [Maven](#maven) - -这些模板将帮助你建立项目的框架并创建初始化的构建文件。 - -## SBT - -### 创建项目 - -你可以通过以下两种方法之一构建新项目: - -<ul class="nav nav-tabs" style="border-bottom: none;"> - <li class="active"><a href="#sbt_template" data-toggle="tab">使用 <strong>sbt 模版</strong></a></li> - <li><a href="#quickstart-script-sbt" data-toggle="tab">运行 <strong>quickstart 脚本</strong></a></li> -</ul> - -<div class="tab-content"> - <div class="tab-pane active" id="sbt_template"> -{% highlight bash %} -$ sbt new tillrohrmann/flink-project.g8 -{% endhighlight %} - 这里将提示你输入几个参数 (项目名称,Flink版本...) 然后从 <a href="https://github.com/tillrohrmann/flink-project.g8">Flink项目模版</a>创建一个Flink项目。 - 你的sbt版本需要不小于0.13.13才能执行这个命令。如有必要,你可以参考这个<a href="http://www.scala-sbt.org/download.html">安装指南</a>获取合适版本的sbt。 - </div> - <div class="tab-pane" id="quickstart-script-sbt"> -{% highlight bash %} -$ bash <(curl https://flink.apache.org/q/sbt-quickstart.sh) -{% endhighlight %} - 这将在<strong>指定的</strong>目录创建一个Flink项目。 - </div> -</div> - -### 构建项目 - -为了构建你的项目,仅需简单的运行 `sbt clean assembly` 命令。 -这将在 __target/scala_your-major-scala-version/__ 目录中创建一个 fat-jar __your-project-name-assembly-0.1-SNAPSHOT.jar__。 - -### 运行项目 - -为了构建你的项目,需要运行 `sbt run`。 - -默认情况下,这会在运行 `sbt` 的 JVM 中运行你的作业。 -为了在不同的 JVM 中运行,请添加以下内容添加到 `build.sbt` - -{% highlight scala %} -fork in run := true -{% endhighlight %} - - -#### IntelliJ - -我们建议你使用 [IntelliJ](https://www.jetbrains.com/idea/) 来开发Flink作业。 -开始,你需要将新建的项目导入到 IntelliJ。 -通过 `File -> New -> Project from Existing Sources...` 操作路径,然后选择项目目录。之后 IntelliJ 将自动检测到 `build.sbt` 文件并设置好所有内容。 - -为了运行你的Flink作业,建议选择 `mainRunner` 模块作为 __Run/Debug Configuration__ 的类路径。 -这将确保在作业执行时可以使用所有设置为 _provided_ 的依赖项。 -你可以通过 `Run -> Edit Configurations...` 配置 __Run/Debug Configurations__,然后从 _Use classpath of module_ 下拉框中选择 `mainRunner`。 - -#### Eclipse - -为了将新建的项目导入 [Eclipse](https://eclipse.org/), 首先需要创建一个 Eclipse 项目文件。 -通过插件 [sbteclipse](https://github.com/typesafehub/sbteclipse) 创建项目文件,并将下面的内容添加到 `PROJECT_DIR/project/plugins.sbt` 文件中: - -{% highlight bash %} -addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "4.0.0") -{% endhighlight %} - -在 `sbt` 中使用以下命令创建 Eclipse 项目文件 - -{% highlight bash %} -> eclipse -{% endhighlight %} - -现在你可以通过 `File -> Import... -> Existing Projects into Workspace` 将项目导入 Eclipse,然后选择项目目录。 - -## Maven - -### 环境要求 - -唯一的要求是安装 __Maven 3.0.4__ (或更高版本) 和 __Java 8.x__。 - - -### 创建项目 - -使用以下命令之一来 __创建项目__: - -<ul class="nav nav-tabs" style="border-bottom: none;"> - <li class="active"><a href="#maven-archetype" data-toggle="tab">使用 <strong>Maven archetypes</strong></a></li> - <li><a href="#quickstart-script" data-toggle="tab">运行 <strong>quickstart 开始脚本</strong></a></li> -</ul> - -<div class="tab-content"> - <div class="tab-pane active" id="maven-archetype"> -{% highlight bash %} -$ mvn archetype:generate \ - -DarchetypeGroupId=org.apache.flink \ - -DarchetypeArtifactId=flink-quickstart-scala \{% unless site.is_stable %} - -DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/ \{% endunless %} - -DarchetypeVersion={{site.version}} -{% endhighlight %} - 这将允许你 <strong>为新项目命名</strong>。同时以交互式的方式询问你项目的 groupId,artifactId 和 package 名称. - </div> - <div class="tab-pane" id="quickstart-script"> -{% highlight bash %} -{% if site.is_stable %} -$ curl https://flink.apache.org/q/quickstart-scala.sh | bash -s {{site.version}} -{% else %} -$ curl https://flink.apache.org/q/quickstart-scala-SNAPSHOT.sh | bash -s {{site.version}} -{% endif %} -{% endhighlight %} - </div> - {% unless site.is_stable %} - <p style="border-radius: 5px; padding: 5px" class="bg-danger"> - <b>注意</b>:Maven 3.0 及更高版本,不再支持通过命令行指定仓库(-DarchetypeCatalog)。有关这个改动的详细信息, - 请参阅 <a href="http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html">Maven 官方文档</a> - 如果你希望使用快照仓库,则需要在 settings.xml 文件中添加一个仓库条目。例如: -{% highlight bash %} -<settings> - <activeProfiles> - <activeProfile>apache</activeProfile> - </activeProfiles> - <profiles> - <profile> - <id>apache</id> - <repositories> - <repository> - <id>apache-snapshots</id> - <url>https://repository.apache.org/content/repositories/snapshots/</url> - </repository> - </repositories> - </profile> - </profiles> -</settings> -{% endhighlight %} - </p> - {% endunless %} -</div> - - -### 检查项目 - -项目创建后,工作目录中将多出一个新目录。如果你使用的是 _curl_ 方式创建项目,目录称为 `quickstart`,如果是另外一种创建方式,目录则称为你指定的 `artifactId`。 - -{% highlight bash %} -$ tree quickstart/ -quickstart/ -├── pom.xml -└── src - └── main - ├── resources - │ └── log4j2.properties - └── scala - └── org - └── myorg - └── quickstart - ├── BatchJob.scala - └── StreamingJob.scala -{% endhighlight %} - -样例项目是一个 __Maven 项目__, 包含了两个类: _StreamingJob_ 和 _BatchJob_ 是 *DataStream* 和 *DataSet* 程序的基本框架程序. -_main_ 方法是程序的入口, 既用于 IDE 内的测试/执行,也用于合理部署。 - -我们建议你将 __此项目导入你的 IDE__。 - -IntelliJ IDEA 支持 Maven 开箱即用,并为Scala开发提供插件。 -从我们的经验来看,IntelliJ 提供了最好的Flink应用程序开发体验。 - -对于 Eclipse,需要以下的插件,你可以从提供的 Eclipse Update Sites 安装这些插件: - -* _Eclipse 4.x_ - * [Scala IDE](http://download.scala-ide.org/sdk/lithium/e44/scala211/stable/site) - * [m2eclipse-scala](http://alchim31.free.fr/m2e-scala/update-site) - * [Build Helper Maven Plugin](https://repo1.maven.org/maven2/.m2e/connectors/m2eclipse-buildhelper/0.15.0/N/0.15.0.201207090124/) -* _Eclipse 3.8_ - * [Scala IDE for Scala 2.11](http://download.scala-ide.org/sdk/helium/e38/scala211/stable/site) 或者 [Scala IDE for Scala 2.10](http://download.scala-ide.org/sdk/helium/e38/scala210/stable/site) - * [m2eclipse-scala](http://alchim31.free.fr/m2e-scala/update-site) - * [Build Helper Maven Plugin](https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.14.0/N/0.14.0.201109282148/) - -### 构建 - -如果你想要 __构建/打包你的项目__, 进入到你的项目目录,并执行命令‘`mvn clean package`’。 -你将 __找到一个 JAR 文件__,其中包含了你的应用程序,以及已作为依赖项添加到应用程序的连接器和库:`target/<artifact-id>-<version>.jar`。 - -__注意:__ 如果你使用其他类而不是 *StreamingJob* 作为应用程序的主类/入口,我们建议你相应地更改 `pom.xml` 文件中 `mainClass` 的设置。这样,Flink 运行应用程序时无需另外指定主类。 - - -## 下一步 - -开始编写你的应用! - -如果你准备编写流处理应用,正在寻找灵感来写什么, -可以看看[流处理应用程序教程]({{ site.baseurl }}/zh/getting-started/walkthroughs/datastream_api.html) - -如果你准备编写批处理应用,正在寻找灵感来写什么, -可以看看[批处理应用程序示例]({{ site.baseurl }}/zh/dev/batch/examples.html) - -有关 API 的完整概述,请查看 -[DataStream API]({{ site.baseurl }}/zh/dev/datastream_api.html) 和 -[DataSet API]({{ site.baseurl }}/zh/dev/batch/index.html) 部分。 - -在[这里]({{ site.baseurl }}/zh/ops/deployment/local.html),你可以找到如何在IDE外的本地集群中运行应用程序。 - -如果你有任何问题,请发信至我们的[邮箱列表](http://mail-archives.apache.org/mod_mbox/flink-user/)。 -我们很乐意提供帮助。 - -{% top %} diff --git a/docs/redirects/dependencies.md b/docs/redirects/dependencies.md index 26debf6..7984834 100644 --- a/docs/redirects/dependencies.md +++ b/docs/redirects/dependencies.md @@ -1,7 +1,7 @@ --- title: "Configuring Dependencies, Connectors, Libraries" layout: redirect -redirect: /dev/projectsetup/dependencies.html +permalink: /dev/project-configuration.html permalink: /start/dependencies.html --- <!-- diff --git a/docs/redirects/dependencies.md b/docs/redirects/getting-started-dependencies.md similarity index 83% copy from docs/redirects/dependencies.md copy to docs/redirects/getting-started-dependencies.md index 26debf6..072a4da 100644 --- a/docs/redirects/dependencies.md +++ b/docs/redirects/getting-started-dependencies.md @@ -1,8 +1,8 @@ --- -title: "Configuring Dependencies, Connectors, Libraries" +title: Configuring Dependencies, Connectors, Libraries layout: redirect -redirect: /dev/projectsetup/dependencies.html -permalink: /start/dependencies.html +redirect: /dev/project-configuration.html +permalink: /getting-started/project-setup/dependencies.html --- <!-- Licensed to the Apache Software Foundation (ASF) under one diff --git a/docs/getting-started/project-setup/index.md b/docs/redirects/java-quickstart.md similarity index 83% rename from docs/getting-started/project-setup/index.md rename to docs/redirects/java-quickstart.md index 354ce40..56d7d90 100644 --- a/docs/getting-started/project-setup/index.md +++ b/docs/redirects/java-quickstart.md @@ -1,9 +1,8 @@ --- -title: "Project Setup" -nav-id: project-setup -nav-title: 'Project Setup' -nav-parent_id: getting-started -nav-pos: 30 +title: Java Quckstart +layout: redirect +redirect: /dev/project-configuration.html +permalink: /getting-started/project-setup/java_api_quickstart.html --- <!-- Licensed to the Apache Software Foundation (ASF) under one diff --git a/docs/getting-started/project-setup/index.zh.md b/docs/redirects/scala-quickstart.md similarity index 83% rename from docs/getting-started/project-setup/index.zh.md rename to docs/redirects/scala-quickstart.md index ae9512e..e8568e6 100644 --- a/docs/getting-started/project-setup/index.zh.md +++ b/docs/redirects/scala-quickstart.md @@ -1,9 +1,8 @@ --- -title: "项目构建设置" -nav-id: project-setup -nav-title: '项目构建设置' -nav-parent_id: getting-started -nav-pos: 30 +title: Java Quckstart +layout: redirect +redirect: /dev/project-configuration.html +permalink: /getting-started/project-setup/scala_api_quickstart.html --- <!-- Licensed to the Apache Software Foundation (ASF) under one diff --git a/docs/redirects/scala_quickstart.md b/docs/redirects/scala_quickstart.md index f50c4dc..62a507f 100644 --- a/docs/redirects/scala_quickstart.md +++ b/docs/redirects/scala_quickstart.md @@ -1,7 +1,7 @@ --- title: "Project Template for Scala" layout: redirect -redirect: /dev/projectsetup/scala_api_quickstart.html +redirect: /dev/project-configuration.html permalink: /quickstart/scala_api_quickstart.html --- <!--