date:20160106

Re: Flink on EMR Question

2016-01-06 Thread Stephan Ewen

At a first look, I think that "flink-runtime" does not need Apache
Httpclient at all. I'll try to simply remove that dependency...

On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park  wrote:

> Hi,
>
> Thanks for answering me!
>
> It is happy to hear the problem will be addressed. :)
>
> About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file
> system api implemented by Amazon uses 4.3.x. There are some API changes, so
> NoSuchMethodError exception occurs.
>
> > On Jan 5, 2016, at 11:59 PM, Stephan Ewen  wrote:
> >
> > Hi!
> >
> > Concerning (1) We have seen that a few times. The JVMs / Threads do
> sometimes not properly exit in a graceful way, and YARN is not always able
> to kill the process (YARN bug). I am currently working on a refactoring of
> the YARN resource manager (to allow to easy addition of other frameworks)
> and have addressed this as part of that. Will be in the master in a bit.
> >
> > Concerning (2) Do you know which component in Flink uses the HTTP client?
> >
> > Greetings,
> > Stephan
> >
> >
> > On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <
> maximilian.b...@tngtech.com> wrote:
> > Hi everyone,
> >
> > Regarding Q1, I believe I have witnessed a comparable phenomenon in a
> (3-node, non-EMR) YARN cluster. After shutting down the yarn session via
> `stop`, one container seems to linger around. `yarn application -list` is
> empty, whereas `bin/yarn-session.sh -q` lists the left-over container.
> Also, there is still one application shown as ‚running‘ in Ambari’s YARN
> pane under current applications. Then, after some time (order of a few
> minutes) it disappears and the resources are available again.
> >
> > I have not tested this behavior extensibly so far. Noticeably, I was not
> able to reproduce it by just starting a session and then ending it again
> right away without looking at the JobManager web interface. Maybe this
> produces some kind of lag as far as YARN containers are concerned?
> >
> > Cheers,
> > Max
> >
> > > Am 04.01.2016 um 12:52 schrieb Chiwan Park :
> > >
> > > Hi All,
> > >
> > > I have some problems using Flink on Amazon EMR cluster.
> > >
> > > Q1. Sometimes, jobmanager container still exists after destroying yarn
> session by pressing Ctrl+C. In that case, Flink YARN app seems exited
> correctly in YARN RM dashboard. But there is a running container in the
> dashboard. From logs of the container, I realize that the container is
> jobmanager.
> > >
> > > I cannot kill the container because there is no permission to restart
> YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem
> doesn’t appear.
> > >
> > > Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it
> because of version conflict of Apache Httpclient. In default,
> implementation of S3 file system in EMR is
> `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with
> other version of Apache Httpclient.
> > >
> > > As I wrote above, I cannot restart Hadoop cluster after modifying
> conf-site.xml because of lack of permission. How can I solve this problem?
> > >
> > > Regards,
> > > Chiwan Park
> > >
> > >
>
> Regards,
> Chiwan Park
>
>
>

Re: flink kafka scala error

2016-01-06 Thread Till Rohrmann

Hi Madhukar,

could you check whether your Flink installation contains the
flink-dist-0.10.1.jar in the lib folder? This file contains the necessary
scala-library.jar which you are missing. You can also remove the line
org.scala-lang:scala-library which excludes the
scala-library dependency to be included in the fat jar of your job.

Cheers,
Till


On Wed, Jan 6, 2016 at 5:54 AM, Madhukar Thota 
wrote:

> Hi
>
> I am seeing the following error when i am trying to run the jar in Flink
> Cluster. I am not sure what dependency is missing.
>
>  /opt/DataHUB/flink-0.10.1/bin/flink  run datahub-heka-1.0-SNAPSHOT.jar
> flink.properties
> java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
> at kafka.utils.Pool.(Pool.scala:28)
> at
> kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala:60)
> at
> kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala)
> at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:39)
> at
> kafka.javaapi.consumer.SimpleConsumer.(SimpleConsumer.scala:34)
> at
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.getPartitionsForTopic(FlinkKafkaConsumer.java:691)
> at
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.(FlinkKafkaConsumer.java:281)
> at
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer082.(FlinkKafkaConsumer082.java:49)
> at com.lmig.datahub.heka.Main.main(Main.java:39)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
> at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
> at
> org.apache.flink.client.program.Client.runBlocking(Client.java:252)
> at
> org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676)
> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)
> at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028)
> Caused by: java.lang.ClassNotFoundException:
> scala.collection.GenTraversableOnce$class
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 20 more
>
> The exception above occurred while trying to run your command.
>
>
> Here is my pom.xml:
>
> 
> http://maven.apache.org/POM/4.0.0;
>  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
> http://maven.apache.org/xsd/maven-4.0.0.xsd;>
> 4.0.0
>
> com.datahub
> datahub-heka
> 1.0-SNAPSHOT
> 
> 
> org.apache.flink
> flink-java
> 0.10.1
> 
> 
> org.apache.flink
> flink-streaming-java
> 0.10.1
> 
> 
> org.apache.flink
> flink-clients
> 0.10.1
> 
> 
> org.apache.kafka
> kafka_2.10
> 0.8.2.2
> 
> 
> org.apache.flink
> flink-connector-kafka
> 0.10.1
> 
> 
> org.apache.flink
> flink-connector-elasticsearch
> 0.10.1
> 
> 
> org.elasticsearch
> elasticsearch
> 1.7.2
> 
> 
> org.elasticsearch
> elasticsearch-shield
> 1.3.3
> 
> 
> org.elasticsearch
> elasticsearch-license-plugin
> 1.0.0
> 
> 
> com.fasterxml.jackson.core
> jackson-core
> 2.6.4
> 
> 
> com.fasterxml.jackson.core
> jackson-databind
> 2.6.4
> 
> 
> 
> 
> elasticsearch-releases
> http://maven.elasticsearch.org/releases
> 
> true
> 
> 
> false
> 
> 
> 
> 
> 
> 
> 
> org.apache.maven.plugins
> maven-shade-plugin
> 2.4.1
> 
> 
> 
> package
> 
> shade
>

Re: Flink on EMR Question

2016-01-06 Thread Ufuk Celebi

@Stephan: It was added to the dependency management section in order to enforce 
a higher version for S3 client, because it was causing problems earlier.

> On 06 Jan 2016, at 11:14, Chiwan Park  wrote:
> 
> Great! Thanks for addressing!
> 
>> On Jan 6, 2016, at 5:51 PM, Stephan Ewen  wrote:
>> 
>> At a first look, I think that "flink-runtime" does not need Apache 
>> Httpclient at all. I'll try to simply remove that dependency...
>> 
>> On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park  wrote:
>> Hi,
>> 
>> Thanks for answering me!
>> 
>> It is happy to hear the problem will be addressed. :)
>> 
>> About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file 
>> system api implemented by Amazon uses 4.3.x. There are some API changes, so 
>> NoSuchMethodError exception occurs.
>> 
>>> On Jan 5, 2016, at 11:59 PM, Stephan Ewen  wrote:
>>> 
>>> Hi!
>>> 
>>> Concerning (1) We have seen that a few times. The JVMs / Threads do 
>>> sometimes not properly exit in a graceful way, and YARN is not always able 
>>> to kill the process (YARN bug). I am currently working on a refactoring of 
>>> the YARN resource manager (to allow to easy addition of other frameworks) 
>>> and have addressed this as part of that. Will be in the master in a bit.
>>> 
>>> Concerning (2) Do you know which component in Flink uses the HTTP client?
>>> 
>>> Greetings,
>>> Stephan
>>> 
>>> 
>>> On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode 
>>>  wrote:
>>> Hi everyone,
>>> 
>>> Regarding Q1, I believe I have witnessed a comparable phenomenon in a 
>>> (3-node, non-EMR) YARN cluster. After shutting down the yarn session via 
>>> `stop`, one container seems to linger around. `yarn application -list` is 
>>> empty, whereas `bin/yarn-session.sh -q` lists the left-over container. 
>>> Also, there is still one application shown as ‚running‘ in Ambari’s YARN 
>>> pane under current applications. Then, after some time (order of a few 
>>> minutes) it disappears and the resources are available again.
>>> 
>>> I have not tested this behavior extensibly so far. Noticeably, I was not 
>>> able to reproduce it by just starting a session and then ending it again 
>>> right away without looking at the JobManager web interface. Maybe this 
>>> produces some kind of lag as far as YARN containers are concerned?
>>> 
>>> Cheers,
>>> Max
>>> 
 Am 04.01.2016 um 12:52 schrieb Chiwan Park :

 Hi All,

 I have some problems using Flink on Amazon EMR cluster.

 Q1. Sometimes, jobmanager container still exists after destroying yarn 
 session by pressing Ctrl+C. In that case, Flink YARN app seems exited 
 correctly in YARN RM dashboard. But there is a running container in the 
 dashboard. From logs of the container, I realize that the container is 
 jobmanager.

 I cannot kill the container because there is no permission to restart YARN 
 RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem 
 doesn’t appear.

 Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it 
 because of version conflict of Apache Httpclient. In default, 
 implementation of S3 file system in EMR is 
 `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with 
 other version of Apache Httpclient.

 As I wrote above, I cannot restart Hadoop cluster after modifying 
 conf-site.xml because of lack of permission. How can I solve this problem?

 Regards,
 Chiwan Park

>> 
>> Regards,
>> Chiwan Park
> 
> Regards,
> Chiwan Park
> 
>

Re: Flink on EMR Question

2016-01-06 Thread Chiwan Park

Great! Thanks for addressing!

> On Jan 6, 2016, at 5:51 PM, Stephan Ewen  wrote:
> 
> At a first look, I think that "flink-runtime" does not need Apache Httpclient 
> at all. I'll try to simply remove that dependency...
> 
> On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park  wrote:
> Hi,
> 
> Thanks for answering me!
> 
> It is happy to hear the problem will be addressed. :)
> 
> About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3 file 
> system api implemented by Amazon uses 4.3.x. There are some API changes, so 
> NoSuchMethodError exception occurs.
> 
> > On Jan 5, 2016, at 11:59 PM, Stephan Ewen  wrote:
> >
> > Hi!
> >
> > Concerning (1) We have seen that a few times. The JVMs / Threads do 
> > sometimes not properly exit in a graceful way, and YARN is not always able 
> > to kill the process (YARN bug). I am currently working on a refactoring of 
> > the YARN resource manager (to allow to easy addition of other frameworks) 
> > and have addressed this as part of that. Will be in the master in a bit.
> >
> > Concerning (2) Do you know which component in Flink uses the HTTP client?
> >
> > Greetings,
> > Stephan
> >
> >
> > On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode 
> >  wrote:
> > Hi everyone,
> >
> > Regarding Q1, I believe I have witnessed a comparable phenomenon in a 
> > (3-node, non-EMR) YARN cluster. After shutting down the yarn session via 
> > `stop`, one container seems to linger around. `yarn application -list` is 
> > empty, whereas `bin/yarn-session.sh -q` lists the left-over container. 
> > Also, there is still one application shown as ‚running‘ in Ambari’s YARN 
> > pane under current applications. Then, after some time (order of a few 
> > minutes) it disappears and the resources are available again.
> >
> > I have not tested this behavior extensibly so far. Noticeably, I was not 
> > able to reproduce it by just starting a session and then ending it again 
> > right away without looking at the JobManager web interface. Maybe this 
> > produces some kind of lag as far as YARN containers are concerned?
> >
> > Cheers,
> > Max
> >
> > > Am 04.01.2016 um 12:52 schrieb Chiwan Park :
> > >
> > > Hi All,
> > >
> > > I have some problems using Flink on Amazon EMR cluster.
> > >
> > > Q1. Sometimes, jobmanager container still exists after destroying yarn 
> > > session by pressing Ctrl+C. In that case, Flink YARN app seems exited 
> > > correctly in YARN RM dashboard. But there is a running container in the 
> > > dashboard. From logs of the container, I realize that the container is 
> > > jobmanager.
> > >
> > > I cannot kill the container because there is no permission to restart 
> > > YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the 
> > > problem doesn’t appear.
> > >
> > > Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it 
> > > because of version conflict of Apache Httpclient. In default, 
> > > implementation of S3 file system in EMR is 
> > > `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with 
> > > other version of Apache Httpclient.
> > >
> > > As I wrote above, I cannot restart Hadoop cluster after modifying 
> > > conf-site.xml because of lack of permission. How can I solve this problem?
> > >
> > > Regards,
> > > Chiwan Park
> > >
> > >
> 
> Regards,
> Chiwan Park

Regards,
Chiwan Park

Re: Local collection data sink for the streaming API

2016-01-06 Thread Filipe Correia

Perfect, thanks!

Filipe

On Tue, Jan 5, 2016 at 6:23 PM, Gábor Gévay  wrote:
> Try the getJavaStream method of the scala DataStream.
>
> Best,
> Gábor
>
>
>
>
> 2016-01-05 19:14 GMT+01:00 Filipe Correia :
>> Hi Gábor, Thanks!
>>
>> I'm using Scala though. DataStreamUtils.collect() depends on
>> org.apache.flink.streaming.api.datastream.DataStream, rather than
>> org.apache.flink.streaming.api.scala.DataStream. Any suggestion on how
>> to handle this, other than creating my own scala implementation of
>> DataStreamUtils.collect()?
>>
>> Thanks,
>>
>> Filipe
>>
>> On Tue, Jan 5, 2016 at 3:33 PM, Gábor Gévay  wrote:
>>> Hi Filipe,
>>>
>>> You can take a look at `DataStreamUtils.collect` in
>>> flink-contrib/flink-streaming-contrib.
>>>
>>> Best,
>>> Gábor
>>>
>>>
>>>
>>> 2016-01-05 16:14 GMT+01:00 Filipe Correia :
 Hi,

 Collecting results locally (e.g., for unit testing) is possible in the
 DataSet API by using "LocalCollectionOutputFormat", as described in
 the programming guide:
 https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/programming_guide.html#collection-data-sources-and-sinks

 Can something similar be done for the DataStream API?

 Thanks,

 Filipe

Re: Flink on EMR Question

2016-01-06 Thread Stephan Ewen

Would it cause problems if I remove it from the "flink-runtime" pom?

Seems strange to have a dependency there that we do not even use...

On Wed, Jan 6, 2016 at 12:07 PM, Ufuk Celebi  wrote:

> @Stephan: It was added to the dependency management section in order to
> enforce a higher version for S3 client, because it was causing problems
> earlier.
>
> > On 06 Jan 2016, at 11:14, Chiwan Park  wrote:
> >
> > Great! Thanks for addressing!
> >
> >> On Jan 6, 2016, at 5:51 PM, Stephan Ewen  wrote:
> >>
> >> At a first look, I think that "flink-runtime" does not need Apache
> Httpclient at all. I'll try to simply remove that dependency...
> >>
> >> On Wed, Jan 6, 2016 at 7:14 AM, Chiwan Park 
> wrote:
> >> Hi,
> >>
> >> Thanks for answering me!
> >>
> >> It is happy to hear the problem will be addressed. :)
> >>
> >> About question 2, flink-runtime uses Apache Httpclient 4.2.6 and S3
> file system api implemented by Amazon uses 4.3.x. There are some API
> changes, so NoSuchMethodError exception occurs.
> >>
> >>> On Jan 5, 2016, at 11:59 PM, Stephan Ewen  wrote:
> >>>
> >>> Hi!
> >>>
> >>> Concerning (1) We have seen that a few times. The JVMs / Threads do
> sometimes not properly exit in a graceful way, and YARN is not always able
> to kill the process (YARN bug). I am currently working on a refactoring of
> the YARN resource manager (to allow to easy addition of other frameworks)
> and have addressed this as part of that. Will be in the master in a bit.
> >>>
> >>> Concerning (2) Do you know which component in Flink uses the HTTP
> client?
> >>>
> >>> Greetings,
> >>> Stephan
> >>>
> >>>
> >>> On Tue, Jan 5, 2016 at 2:49 PM, Maximilian Bode <
> maximilian.b...@tngtech.com> wrote:
> >>> Hi everyone,
> >>>
> >>> Regarding Q1, I believe I have witnessed a comparable phenomenon in a
> (3-node, non-EMR) YARN cluster. After shutting down the yarn session via
> `stop`, one container seems to linger around. `yarn application -list` is
> empty, whereas `bin/yarn-session.sh -q` lists the left-over container.
> Also, there is still one application shown as ‚running‘ in Ambari’s YARN
> pane under current applications. Then, after some time (order of a few
> minutes) it disappears and the resources are available again.
> >>>
> >>> I have not tested this behavior extensibly so far. Noticeably, I was
> not able to reproduce it by just starting a session and then ending it
> again right away without looking at the JobManager web interface. Maybe
> this produces some kind of lag as far as YARN containers are concerned?
> >>>
> >>> Cheers,
> >>> Max
> >>>
>  Am 04.01.2016 um 12:52 schrieb Chiwan Park :
> 
>  Hi All,
> 
>  I have some problems using Flink on Amazon EMR cluster.
> 
>  Q1. Sometimes, jobmanager container still exists after destroying
> yarn session by pressing Ctrl+C. In that case, Flink YARN app seems exited
> correctly in YARN RM dashboard. But there is a running container in the
> dashboard. From logs of the container, I realize that the container is
> jobmanager.
> 
>  I cannot kill the container because there is no permission to restart
> YARN RM in Amazon EMR. In my small Hadoop Cluster (w/3 nodes), the problem
> doesn’t appear.
> 
>  Q2. I tried to use S3 file system in Flink on EMR. But I can’t use it
> because of version conflict of Apache Httpclient. In default,
> implementation of S3 file system in EMR is
> `com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem` which is linked with
> other version of Apache Httpclient.
> 
>  As I wrote above, I cannot restart Hadoop cluster after modifying
> conf-site.xml because of lack of permission. How can I solve this problem?
> 
>  Regards,
>  Chiwan Park
> 
> 
> >>
> >> Regards,
> >> Chiwan Park
> >
> > Regards,
> > Chiwan Park
> >
> >
>
>

Re: kafka integration issue

2016-01-06 Thread Stephan Ewen

Wow, okay, you must have hit exactly the point in time when the update was
pushed ;-)

On Wed, Jan 6, 2016 at 2:18 PM, Alex Rovner 
wrote:

> Updating to latest version worked. Thanks! My git repo was less than 1 day
> old :-(
>
> On Wed, Jan 6, 2016 at 4:54 AM Stephan Ewen  wrote:
>
>> "java.lang.NoSuchMethodError" in Java virtually always means that the
>> code is compiled against a different version than executed.
>>
>> The version in "~/git/flink/" must be slightly outdated. Can you pull
>> the latest update of the 1.0-SNAPSHOT master and rebuild the code?
>>
>> Stephan
>>
>> On Tue, Jan 5, 2016 at 9:48 PM, Robert Metzger 
>> wrote:
>>
>>> Hi Alex,
>>>
>>> How recent is your Flink 1.0-SNAPSHOT build? Maybe the code on the
>>> (local) cluster (in /git/flink/flink-dist/target/
>>> flink-1.0-SNAPSHOT-bin/flink-1.0-SNAPSHOT/) is not up to date?
>>>
>>> I just tried it locally, and the job seems to execute:
>>>
>>> ./bin/flink run
>>> /home/robert/Downloads/flink-poc/target/flink-poc-1.0-SNAPSHOT.jar
>>> org.apache.flink.streaming.api.scala.DataStream@436bc3601/05/2016
>>> 21:44:09 Job execution switched to status RUNNING.
>>> 01/05/2016 21:44:09 Source: Custom Source -> Map(1/1) switched to
>>> SCHEDULED
>>> 01/05/2016 21:44:09 Source: Custom Source -> Map(1/1) switched to
>>> DEPLOYING
>>> 01/05/2016 21:44:09 Source: Custom Source -> Map(1/1) switched to
>>> RUNNING
>>>
>>> By the way, in order to print the stream, you have to
>>> call counts.print() instead of print(counts).
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jan 5, 2016 at 9:35 PM, Alex Rovner 
>>> wrote:
>>>
 I believe I have set the version uniformly, unless I am overlooking
 something in the pom. Attaching my project.

 I have tried building with both "mvn clean package" and "mvn clean
 package -Pbuild-jar" and I get the same exception.

 I am running my app with the following command:

 ~/git/flink/flink-dist/target/flink-1.0-SNAPSHOT-bin/flink-1.0-SNAPSHOT/bin/flink
 run -c com.magnetic.KafkaWordCount
 ~/git/flink-poc/target/flink-poc-1.0-SNAPSHOT.jar

 On Tue, Jan 5, 2016 at 12:47 PM Robert Metzger 
 wrote:

> I think the problem is that you only set the version of the Kafka
> connector to 1.0-SNAPSHOT, not for the rest of the Flink dependencies.
>
> On Tue, Jan 5, 2016 at 6:18 PM, Alex Rovner 
> wrote:
>
>> Thanks Till for the info. I tried switching to 1.0-SNAPSHOT and now
>> facing another error:
>>
>> Caused by: java.lang.NoSuchMethodError:
>> org.apache.flink.streaming.api.operators.StreamingRuntimeContext.isCheckpointingEnabled()Z
>> at
>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.run(FlinkKafkaConsumer.java:413)
>> at
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:58)
>> at
>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:55)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:218)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>> On Tue, Jan 5, 2016 at 3:54 AM Till Rohrmann 
>> wrote:
>>
>>> Hi Alex,
>>>
>>> this is a bug in the `0.10` release. Is it possible for you to
>>> switch to version `1.0-SNAPSHOT`. With this version, the error should no
>>> longer occur.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Jan 5, 2016 at 1:31 AM, Alex Rovner <
>>> alex.rov...@magnetic.com> wrote:
>>>
 Hello Flinkers!

 The below program produces the following error when running
 locally. I am building the program using maven, using 0.10.0 and 
 running in
 streaming only local mode "start-local-streaming.sh".  I have verified 
 that
 kafka and the topic is working properly by using kafka-console-*.sh
 scripts. What am I doing wrong? Any help would be appreciated it.

 Caused by: java.lang.NumberFormatException: For input string: ""

 at
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

 at java.lang.Long.parseLong(Long.java:601)

 at java.lang.Long.valueOf(Long.java:803)

 at
 org.apache.flink.streaming.connectors.kafka.internals.ZookeeperOffsetHandler.getOffsetFromZooKeeper(ZookeeperOffsetHandler.java:125)

 at
 org.apache.flink.streaming.connectors.kafka.internals.ZookeeperOffsetHandler.seekFetcherToInitialOffsets(ZookeeperOffsetHandler.java:88)


 def main(args: Array[String]) {
   val env = StreamExecutionEnvironment.getExecutionEnvironment

Re: kafka integration issue

2016-01-06 Thread Alex Rovner

Updating to latest version worked. Thanks! My git repo was less than 1 day
old :-(

On Wed, Jan 6, 2016 at 4:54 AM Stephan Ewen  wrote:

> "java.lang.NoSuchMethodError" in Java virtually always means that the
> code is compiled against a different version than executed.
>
> The version in "~/git/flink/" must be slightly outdated. Can you pull the
> latest update of the 1.0-SNAPSHOT master and rebuild the code?
>
> Stephan
>
> On Tue, Jan 5, 2016 at 9:48 PM, Robert Metzger 
> wrote:
>
>> Hi Alex,
>>
>> How recent is your Flink 1.0-SNAPSHOT build? Maybe the code on the
>> (local) cluster (in /git/flink/flink-dist/target/
>> flink-1.0-SNAPSHOT-bin/flink-1.0-SNAPSHOT/) is not up to date?
>>
>> I just tried it locally, and the job seems to execute:
>>
>> ./bin/flink run
>> /home/robert/Downloads/flink-poc/target/flink-poc-1.0-SNAPSHOT.jar
>> org.apache.flink.streaming.api.scala.DataStream@436bc3601/05/2016
>> 21:44:09 Job execution switched to status RUNNING.
>> 01/05/2016 21:44:09 Source: Custom Source -> Map(1/1) switched to
>> SCHEDULED
>> 01/05/2016 21:44:09 Source: Custom Source -> Map(1/1) switched to
>> DEPLOYING
>> 01/05/2016 21:44:09 Source: Custom Source -> Map(1/1) switched to
>> RUNNING
>>
>> By the way, in order to print the stream, you have to call counts.print()
>> instead of print(counts).
>>
>>
>>
>>
>>
>>
>> On Tue, Jan 5, 2016 at 9:35 PM, Alex Rovner 
>> wrote:
>>
>>> I believe I have set the version uniformly, unless I am overlooking
>>> something in the pom. Attaching my project.
>>>
>>> I have tried building with both "mvn clean package" and "mvn clean
>>> package -Pbuild-jar" and I get the same exception.
>>>
>>> I am running my app with the following command:
>>>
>>> ~/git/flink/flink-dist/target/flink-1.0-SNAPSHOT-bin/flink-1.0-SNAPSHOT/bin/flink
>>> run -c com.magnetic.KafkaWordCount
>>> ~/git/flink-poc/target/flink-poc-1.0-SNAPSHOT.jar
>>>
>>> On Tue, Jan 5, 2016 at 12:47 PM Robert Metzger 
>>> wrote:
>>>
 I think the problem is that you only set the version of the Kafka
 connector to 1.0-SNAPSHOT, not for the rest of the Flink dependencies.

 On Tue, Jan 5, 2016 at 6:18 PM, Alex Rovner 
 wrote:

> Thanks Till for the info. I tried switching to 1.0-SNAPSHOT and now
> facing another error:
>
> Caused by: java.lang.NoSuchMethodError:
> org.apache.flink.streaming.api.operators.StreamingRuntimeContext.isCheckpointingEnabled()Z
> at
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.run(FlinkKafkaConsumer.java:413)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:58)
> at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:55)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:218)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
> at java.lang.Thread.run(Thread.java:745)
>
>
> On Tue, Jan 5, 2016 at 3:54 AM Till Rohrmann 
> wrote:
>
>> Hi Alex,
>>
>> this is a bug in the `0.10` release. Is it possible for you to switch
>> to version `1.0-SNAPSHOT`. With this version, the error should no longer
>> occur.
>>
>> Cheers,
>> Till
>>
>> On Tue, Jan 5, 2016 at 1:31 AM, Alex Rovner > > wrote:
>>
>>> Hello Flinkers!
>>>
>>> The below program produces the following error when running locally.
>>> I am building the program using maven, using 0.10.0 and running in
>>> streaming only local mode "start-local-streaming.sh".  I have verified 
>>> that
>>> kafka and the topic is working properly by using kafka-console-*.sh
>>> scripts. What am I doing wrong? Any help would be appreciated it.
>>>
>>> Caused by: java.lang.NumberFormatException: For input string: ""
>>>
>>> at
>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>>>
>>> at java.lang.Long.parseLong(Long.java:601)
>>>
>>> at java.lang.Long.valueOf(Long.java:803)
>>>
>>> at
>>> org.apache.flink.streaming.connectors.kafka.internals.ZookeeperOffsetHandler.getOffsetFromZooKeeper(ZookeeperOffsetHandler.java:125)
>>>
>>> at
>>> org.apache.flink.streaming.connectors.kafka.internals.ZookeeperOffsetHandler.seekFetcherToInitialOffsets(ZookeeperOffsetHandler.java:88)
>>>
>>>
>>> def main(args: Array[String]) {
>>>   val env = StreamExecutionEnvironment.getExecutionEnvironment
>>>
>>>   val properties = new Properties();
>>>   properties.setProperty("bootstrap.servers", "localhost:9092");
>>>   properties.setProperty("zookeeper.connect", "localhost:2181");
>>>   properties.setProperty("group.id", "test");
>>>
>>>   val stream = env
>>> .addSource(new

Re: Problem to show logs in task managers

2016-01-06 Thread Ana M. Martinez

Hi Till,

I am afraid it does not work in any case.

I am following the steps you indicate on your websites (for yarn configuration 
and loggers with slf4j):

1) Enable log aggregation in yarn-site:
https://ci.apache.org/projects/flink/flink-docs-release-0.7/yarn_setup.html#log-files

2) Include Loggers as indicated here (see WordCountExample below):
https://ci.apache.org/projects/flink/flink-docs-release-0.7/internal_logging.html

But I cannot get the log messages that run in the map functions. Am I missing 
something?

Thanks,
Ana

On 04 Jan 2016, at 14:00, Till Rohrmann 
> wrote:

I think the YARN application has to be finished in order for the logs to be 
accessible.

Judging from you commands, you’re starting a long running YARN application 
running Flink with ./bin/yarn-session.sh -n 1 -tm 2048 -s 4. This cluster won’t 
be used though, because you’re executing your job with ./bin/flink run -m 
yarn-cluster which will start another YARN application which is only alive as 
long as the Flink job is executed. If you want to run your job on the long 
running YARN application, then you simply have to omit -m yarn-cluster.

Cheers,
Till

On Mon, Jan 4, 2016 at 12:36 PM, Ana M. Martinez 
> wrote:
Hi Till,

Sorry for the delay (Xmas break). I have activated log aggregation on 
flink-conf.yaml with yarn.log-aggregation-enable: true (as I can’t find a 
yarn-site.xml).
But the command yarn logs -applicationId application_1451903796996_0008 gives 
me the following output:

INFO client.RMProxy: Connecting to ResourceManager at xxx
/var/log/hadoop-yarn/apps/hadoop/logs/application_1451903796996_0008does not 
exist.
Log aggregation has not completed or is not enabled

I’ve tried to restart the Flink JobManager and TaskManagers as follows:
./bin/yarn-session.sh -n 1 -tm 2048 -s 4
and then with a detached screen, run my application with ./bin/flink run -m 
yarn-cluster ...

I am not sure if my problem is that I am not setting the log-aggregation-enable 
property well or I am not restarting the Flink JobManager and TaskManagers as I 
should… Any idea?

Thanks,
Ana

On 18 Dec 2015, at 16:29, Till Rohrmann 
> wrote:

In which log file are you exactly looking for the logging statements? And on 
what machine? You have to look on the machines on which the yarn container were 
started. Alternatively if you have log aggregation activated, then you can 
simply retrieve the log files via yarn logs.

Cheers,
Till

On Fri, Dec 18, 2015 at 3:49 PM, Ana M. Martinez 
> wrote:
Hi Till,

Many thanks for your quick response.

I have modified the WordCountExample to re-reproduce my problem in a simple 
example.

I run the code below with the following command:
./bin/flink run -m yarn-cluster -yn 1 -ys 4 -yjm 1024 -ytm 1024 -c 
mypackage.WordCountExample ../flinklink.jar

And if I check the log file I see all logger messages except the one in the 
flatMap function of the inner LineSplitter class, which is actually the one I 
am most interested in.

Is that an expected behaviour?

Thanks,
Ana

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;

public class WordCountExample {
static Logger logger = LoggerFactory.getLogger(WordCountExample.class);

public static void main(String[] args) throws Exception {
final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();

logger.info("Entering application.");

DataSet text = env.fromElements(
"Who's there?",
"I think I hear them. Stand, ho! Who's there?");

List elements = new ArrayList();
elements.add(0);

DataSet set = env.fromElements(new TestClass(elements));

DataSet> wordCounts = text
.flatMap(new LineSplitter())
.withBroadcastSet(set, "set")
.groupBy(0)
.sum(1);

wordCounts.print();

}

public static class LineSplitter implements FlatMapFunction> {

static Logger loggerLineSplitter = 
LoggerFactory.getLogger(LineSplitter.class);

@Override
public void flatMap(String line, Collector> 
out) {
loggerLineSplitter.info("Logger in LineSplitter.flatMap");
for (String word : line.split(" ")) {
out.collect(new Tuple2(word, 1));
}
}
}

public static class TestClass implements Serializable {
private static

Re: Problem to show logs in task managers

2016-01-06 Thread Robert Metzger

Maybe the isConverged() method is never called? For making that sure, just
throw a RuntimeException inside the method to see whats happening.

On Wed, Jan 6, 2016 at 3:58 PM, Ana M. Martinez  wrote:

> Hi Till,
>
> I am afraid it does not work in any case.
>
> I am following the steps you indicate on your websites (for yarn
> configuration and loggers with slf4j):
>
> 1) Enable log aggregation in yarn-site:
>
> https://ci.apache.org/projects/flink/flink-docs-release-0.7/yarn_setup.html#log-files
>
> 2) Include Loggers as indicated here (see WordCountExample below):
>
> https://ci.apache.org/projects/flink/flink-docs-release-0.7/internal_logging.html
>
> But I cannot get the log messages that run in the map functions. Am I
> missing something?
>
> Thanks,
> Ana
>
> On 04 Jan 2016, at 14:00, Till Rohrmann  wrote:
>
> I think the YARN application has to be finished in order for the logs to
> be accessible.
>
> Judging from you commands, you’re starting a long running YARN application
> running Flink with ./bin/yarn-session.sh -n 1 -tm 2048 -s 4. This cluster
> won’t be used though, because you’re executing your job with ./bin/flink
> run -m yarn-cluster which will start another YARN application which is
> only alive as long as the Flink job is executed. If you want to run your
> job on the long running YARN application, then you simply have to omit -m
> yarn-cluster.
>
> Cheers,
> Till
> 
>
> On Mon, Jan 4, 2016 at 12:36 PM, Ana M. Martinez  wrote:
>
>> Hi Till,
>>
>> Sorry for the delay (Xmas break). I have activated log aggregation
>> on flink-conf.yaml with yarn.log-aggregation-enable: true (as I can’t find
>> a yarn-site.xml).
>> But the command yarn logs -applicationId application_1451903796996_0008
>> gives me the following output:
>>
>> INFO client.RMProxy: Connecting to ResourceManager at xxx
>> /var/log/hadoop-yarn/apps/hadoop/logs/application_1451903796996_0008does
>> not exist.
>> Log aggregation has not completed or is not enabled
>>
>>
>> I’ve tried to restart the Flink JobManager and TaskManagers as follows:
>> ./bin/yarn-session.sh -n 1 -tm 2048 -s 4
>> and then with a detached screen, run my application with ./bin/flink run
>> -m yarn-cluster ...
>>
>> I am not sure if my problem is that I am not setting the
>> log-aggregation-enable property well or I am not restarting the Flink
>> JobManager and TaskManagers as I should… Any idea?
>>
>> Thanks,
>> Ana
>>
>> On 18 Dec 2015, at 16:29, Till Rohrmann  wrote:
>>
>> In which log file are you exactly looking for the logging statements? And
>> on what machine? You have to look on the machines on which the yarn
>> container were started. Alternatively if you have log aggregation
>> activated, then you can simply retrieve the log files via yarn logs.
>>
>> Cheers,
>> Till
>>
>> On Fri, Dec 18, 2015 at 3:49 PM, Ana M. Martinez  wrote:
>>
>>> Hi Till,
>>>
>>> Many thanks for your quick response.
>>>
>>> I have modified the WordCountExample to re-reproduce my problem in a
>>> simple example.
>>>
>>> I run the code below with the following command:
>>> ./bin/flink run -m yarn-cluster -yn 1 -ys 4 -yjm 1024 -ytm 1024 -c
>>> mypackage.WordCountExample ../flinklink.jar
>>>
>>> And if I check the log file I see all logger messages except the one in
>>> the flatMap function of the inner LineSplitter class, which is actually the
>>> one I am most interested in.
>>>
>>> Is that an expected behaviour?
>>>
>>> Thanks,
>>> Ana
>>>
>>> import org.apache.flink.api.common.functions.FlatMapFunction;
>>> import org.apache.flink.api.java.DataSet;
>>> import org.apache.flink.api.java.ExecutionEnvironment;
>>> import org.apache.flink.api.java.tuple.Tuple2;
>>> import org.apache.flink.util.Collector;
>>> import org.slf4j.Logger;
>>> import org.slf4j.LoggerFactory;
>>>
>>> import java.io.Serializable;
>>> import java.util.ArrayList;
>>> import java.util.List;
>>>
>>> public class WordCountExample {
>>> static Logger logger = LoggerFactory.getLogger(WordCountExample.class);
>>>
>>> public static void main(String[] args) throws Exception {
>>> final ExecutionEnvironment env = 
>>> ExecutionEnvironment.getExecutionEnvironment();
>>>
>>> logger.info("Entering application.");
>>>
>>> DataSet text = env.fromElements(
>>> "Who's there?",
>>> "I think I hear them. Stand, ho! Who's there?");
>>>
>>> List elements = new ArrayList();
>>> elements.add(0);
>>>
>>>
>>> DataSet set = env.fromElements(new TestClass(elements));
>>>
>>> DataSet> wordCounts = text
>>> .flatMap(new LineSplitter())
>>> .withBroadcastSet(set, "set")
>>> .groupBy(0)
>>> .sum(1);
>>>
>>> wordCounts.print();
>>>
>>>
>>> }
>>>
>>> public static class LineSplitter implements FlatMapFunction>> Tuple2> {
>>>

sources not available for flink-streaming

2016-01-06 Thread Alex Rovner

*IntelliJ reports the following:*
*Cannot download sources*

Sources not found for:
org.apache.flink:flink-streaming-scala:1.0-SNAPSHOT

Would it be possible to publish sources for this artifact?
-- 
*Alex Rovner*
*Director, Data Engineering *
*o:* 646.759.0052

* *

Re: sources not available for flink-streaming

2016-01-06 Thread Robert Metzger

I also saw that this is not working for SNAPSHOT releases (for stable
releases it seems to work).

We are publishing the source artifacts, as you can see here
https://repository.apache.org/content/repositories/snapshots/org/apache/flink/flink-streaming-scala/1.0-SNAPSHOT/


Maybe its an issue with IntelliJ ?


On Wed, Jan 6, 2016 at 7:41 PM, Alex Rovner 
wrote:

> *IntelliJ reports the following:*
> *Cannot download sources*
>
> Sources not found for:
> org.apache.flink:flink-streaming-scala:1.0-SNAPSHOT
>
> Would it be possible to publish sources for this artifact?
> --
> *Alex Rovner*
> *Director, Data Engineering *
> *o:* 646.759.0052
>
> * *
>
>

Re: flink kafka scala error

2016-01-06 Thread Madhukar Thota

I did the solve problem by changing the scala version for Kafka library as
i download the scala_2.11 version of flink (
flink-0.10.1-bin-hadoop27-scala_2.11.tg

z).

Before:


  
org.apache.kafka
kafka_2.10
0.8.2.2


After:


  
org.apache.kafka
kafka_2.11
0.8.2.2




On Wed, Jan 6, 2016 at 4:13 AM, Till Rohrmann  wrote:

> Hi Madhukar,
>
> could you check whether your Flink installation contains the
> flink-dist-0.10.1.jar in the lib folder? This file contains the necessary
> scala-library.jar which you are missing. You can also remove the line
> org.scala-lang:scala-library which excludes the
> scala-library dependency to be included in the fat jar of your job.
>
> Cheers,
> Till
> 
>
> On Wed, Jan 6, 2016 at 5:54 AM, Madhukar Thota 
> wrote:
>
>> Hi
>>
>> I am seeing the following error when i am trying to run the jar in Flink
>> Cluster. I am not sure what dependency is missing.
>>
>>  /opt/DataHUB/flink-0.10.1/bin/flink  run datahub-heka-1.0-SNAPSHOT.jar
>> flink.properties
>> java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
>> at kafka.utils.Pool.(Pool.scala:28)
>> at
>> kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala:60)
>> at
>> kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala)
>> at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:39)
>> at
>> kafka.javaapi.consumer.SimpleConsumer.(SimpleConsumer.scala:34)
>> at
>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.getPartitionsForTopic(FlinkKafkaConsumer.java:691)
>> at
>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.(FlinkKafkaConsumer.java:281)
>> at
>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer082.(FlinkKafkaConsumer082.java:49)
>> at com.lmig.datahub.heka.Main.main(Main.java:39)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:497)
>> at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
>> at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
>> at
>> org.apache.flink.client.program.Client.runBlocking(Client.java:252)
>> at
>> org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676)
>> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)
>> at
>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978)
>> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028)
>> Caused by: java.lang.ClassNotFoundException:
>> scala.collection.GenTraversableOnce$class
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ... 20 more
>>
>> The exception above occurred while trying to run your command.
>>
>>
>> Here is my pom.xml:
>>
>> 
>> http://maven.apache.org/POM/4.0.0;
>>  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>>  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
>> http://maven.apache.org/xsd/maven-4.0.0.xsd;>
>> 4.0.0
>>
>> com.datahub
>> datahub-heka
>> 1.0-SNAPSHOT
>> 
>> 
>> org.apache.flink
>> flink-java
>> 0.10.1
>> 
>> 
>> org.apache.flink
>> flink-streaming-java
>> 0.10.1
>> 
>> 
>> org.apache.flink
>> flink-clients
>> 0.10.1
>> 
>> 
>> org.apache.kafka
>> kafka_2.10
>> 0.8.2.2
>> 
>> 
>> org.apache.flink
>> flink-connector-kafka
>> 0.10.1
>> 
>> 
>> org.apache.flink
>> flink-connector-elasticsearch
>> 0.10.1
>> 
>> 
>> org.elasticsearch
>> elasticsearch
>> 1.7.2
>> 
>> 
>> org.elasticsearch
>> elasticsearch-shield
>> 1.3.3
>> 
>> 
>> org.elasticsearch
>> elasticsearch-license-plugin
>> 1.0.0
>> 
>> 
>> com.fasterxml.jackson.core
>> jackson-core
>> 2.6.4
>> 
>>

Re: Monitoring single-run job statistics

2016-01-06 Thread Filip Łęczycki

Hi Stephan,

Thank you for you answer. I would love to contribute but currently I have
no capacity as I am buried with my thesis.

I will reach out after graduating :)

Bestr regards.
Filip

Pozdrawiam,
Filip Łęczycki

2016-01-05 10:35 GMT+01:00 Stephan Ewen :

> Hi Filip!
>
> There are thoughts and efforts to extend Flink to push the result
> statistics of Flink jobs to the YARN timeline server. That way, you can
> explore jobs that are completed.
>
> Since the whole web dashboard in Flink has a pure REST design, this is a
> quite straightforward fix.
>
> From the capacities I see in the community, I can not promise that to be
> fixed immediately. Let me know, though, if you are interested in
> contributing an addition there, and I can walk you through the steps that
> would be needed.
>
> Greetings,
> Stephan
>
>
> On Mon, Jan 4, 2016 at 9:17 PM, Filip Łęczycki 
> wrote:
>
>> Hi Till,
>>
>> Thank you for you answer however I am sorry to hear that. I was reluctant
>> to execute jobs with long running Flink cluster due to the fact that
>> multiple jobs would cloud yarn statistics regarding cpu and memory time as
>> well as Flink's garbage collector statistics in log, as they would be
>> stored for the whole Flink cluster, instead of a single job.
>>
>> Do you know whether is there a way to extract mentioned stats (cpu time,
>> mem time, gc time) for a single job ran on long running Flink cluster?
>>
>> I will be very grateful for an answer:)
>>
>> Best regards,
>> Filip
>>
>> Pozdrawiam,
>> Filip Łęczycki
>>
>> 2016-01-04 10:05 GMT+01:00 Till Rohrmann :
>>
>>> Hi Filip,
>>>
>>> at the moment it is not possible to retrieve the job statistics after
>>> the job has finished with flink run -m yarn-cluster. The reason is that
>>> the YARN cluster is only alive as long as the job is executed. Thus, I
>>> would recommend you to execute your jobs with a long running Flink cluster
>>> on YARN.
>>>
>>> Cheers,
>>> Till
>>> 
>>>
>>> On Fri, Jan 1, 2016 at 11:29 PM, Filip Łęczycki >> > wrote:
>>>
 Hi all,

 I am running filnk aps on YARN cluster and I am trying to get some
 benchmarks. When I start a long-running flink cluster on my YARN cluster I
 have an access to web UI and rest API that provide me statistics of the
 deployed jobs (as desribed here:
 https://ci.apache.org/projects/flink/flink-docs-master/internals/monitoring_rest_api.html).
 I was wondering is this possible to get such information about a single run
 job trigerred with 'flink run -m yarn-cluster ...'? After the job is
 finished there is no flink client running so I cannot use rest api to get
 stats.

 Thanks for any help:)


 Best regards/Pozdrawiam,
 Filip Łęczycki

>>>
>>>
>>
>

Re: Flink on EMR Question

Re: flink kafka scala error

Re: Flink on EMR Question

Re: Flink on EMR Question

Re: Local collection data sink for the streaming API

Re: Flink on EMR Question

Re: kafka integration issue

Re: kafka integration issue

Re: Problem to show logs in task managers

Re: Problem to show logs in task managers

sources not available for flink-streaming

Re: sources not available for flink-streaming

Re: flink kafka scala error

Re: Monitoring single-run job statistics

14 matches

Site Navigation

Mail list logo

Footer information