[jira] [Commented] (FLINK-2209) Document how to use TableAPI, Gelly and FlinkML, StreamingConnectors on a cluster

ASF GitHub Bot (JIRA) Mon, 15 Jun 2015 02:33:32 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585704#comment-14585704
 ]


ASF GitHub Bot commented on FLINK-2209:
---------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/835#discussion_r32404022
  
    --- Diff: docs/apis/cluster_execution.md ---
    @@ -80,67 +80,73 @@ Note that the program contains custom user code and 
hence requires a JAR file wi
     the classes of the code attached. The constructor of the remote environment
     takes the path(s) to the JAR file(s).
     
    -## Remote Executor
    +## Linking with modules not contained in the binary distribution
     
    -Similar to the RemoteEnvironment, the RemoteExecutor lets you execute
    -Flink programs on a cluster directly. The remote executor accepts a
    -*Plan* object, which describes the program as a single executable unit.
    +The binary distribution contains jar packages in the `lib` folder that are 
automatically
    +provided to the classpath of your distrbuted programs. Almost all of Flink 
classes are
    +located there with a few exceptions, for example the streaming connectors 
and some freshly
    +added modules. To run code depending on these modules you need to make 
them accessible
    +during runtime, for which we suggest two options:
     
    -### Maven Dependency
    -
    -If you are developing your program in a Maven project, you have to add the
    -`flink-clients` module using this dependency:
    -
    -~~~xml
    -<dependency>
    -  <groupId>org.apache.flink</groupId>
    -  <artifactId>flink-clients</artifactId>
    -  <version>{{ site.version }}</version>
    -</dependency>
    -~~~
    -
    -### Example
    -
    -The following illustrates the use of the `RemoteExecutor` with the Scala 
API:
    -
    -~~~scala
    -def main(args: Array[String]) {
    -    val input = TextFile("hdfs://path/to/file")
    +1. Either copy the required jar files to the `lib` folder onto all of your 
TaskManagers.
    +2. Or package them with your usercode.
     
    -    val words = input flatMap { _.toLowerCase().split("""\W+""") filter { 
_ != "" } }
    -    val counts = words groupBy { x => x } count()
    +The latter version is recommended as it respects the classloader 
management in Flink.
     
    -    val output = counts.write(wordsOutput, CsvOutputFormat())
    -  
    -    val plan = new ScalaPlan(Seq(output), "Word Count")
    -    val executor = new RemoteExecutor("strato-master", 7881, 
"/path/to/jarfile.jar")
    -    executor.executePlan(p);
    -}
    -~~~
    +### Packaging dependencies with your usercode with Maven
     
    -The following illustrates the use of the `RemoteExecutor` with the Java 
API (as
    -an alternative to the RemoteEnvironment):
    +To provide these dependencies not included by Flink we suggest two options 
with Maven.
     
    -~~~java
    -public static void main(String[] args) throws Exception {
    -    ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
    +1. The maven assembly plugin builds a so called fat jar cointaining all 
your dependencies.
    +Easy to configure, but is an overkill in many cases. See 
    +[usage](http://maven.apache.org/plugins/maven-assembly-plugin/usage.html).
    +2. The maven unpack plugin, for unpacking the relevant parts of the 
dependencies and
    +then package it with your code.
     
    -    DataSet<String> data = env.readTextFile("hdfs://path/to/file");
    +To the the latter for example for the streaming Kafka connector, 
`flink-connector-kafka`
    --- End diff --
    
    Wording of the first sentence. Maybe something like: "Using the latter 
approach in order to bundle the Kafka connecter..."


> Document how to use TableAPI, Gelly and FlinkML, StreamingConnectors on a 
> cluster
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-2209
>                 URL: https://issues.apache.org/jira/browse/FLINK-2209
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Till Rohrmann
>            Assignee: Márton Balassi
>
> Currently the TableAPI, Gelly, FlinkML and StreamingConnectors are not part 
> of the Flink dist module. Therefore they are not included in the binary 
> distribution. As a consequence, if you want to use one of these libraries the 
> corresponding jar and all their dependencies have to be either manually put 
> on the cluster or the user has to include them in the user code jar.
> Usually a fat jar is built if the one uses the quickstart archetypes. However 
> if one sets the project manually up this ist not necessarily the case. 
> Therefore, it should be well documented how to run programs using one of 
> these libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2209) Document how to use TableAPI, Gelly and FlinkML, StreamingConnectors on a cluster

Reply via email to