Re: Moving from single-node, Maven examples to cluster execution

2016-06-16 Thread Prez Cannady
All right, I figured I’d have to do shading, but hadn’t gotten around to 
experimenting.

I’ll try it out.

Prez Cannady  
p: 617 500 3378  
e: revp...@opencorrelate.org   
GH: https://github.com/opencorrelate   
LI: https://www.linkedin.com/in/revprez   









> On Jun 16, 2016, at 6:19 PM, Josh  wrote:
> 
> Hi Prez,
> 
> You need to build a jar with all your dependencies bundled inside. With maven 
> you can use maven-assembly-plugin for this, or with SBT there's sbt-assembly.
> 
> Once you've done this, you can login to the JobManager node of your Flink 
> cluster, copy the jar across and use the Flink command line tool to submit 
> jobs to the running cluster, e.g. (from the Flink root directory):
> 
> ./bin/flink run -c my.application.MainClass /path/to/YourApp.jar
> 
> See https://ci.apache.org/projects/flink/flink-docs-master/apis/cli.html 
> 
> You can also run the Flink command line tool locally and submit the jar to a 
> remote JobManager with the -m flag. Although I don't do this at the moment 
> because it doesn't work so easily if you're running Flink on AWS/EMR.
> 
> Josh
> 
> 
> On Thu, Jun 16, 2016 at 10:51 PM, Prez Cannady  > wrote:
> Having a hard time trying to get my head around how to deploy my Flink 
> programs to a pre-configured, remote Flink cluster setup.
> 
> My Mavenized setup uses Spring Boot (to simplify class path handling and 
> generate pretty logs) to execute provision a StreamExecutionEnvironment with 
> Kafka sources and sinks. I can also run this quite effective the standard way 
> (`java -jar …`).  What I’m unclear on is how I might go about distributing 
> this code to run on an existing Flink cluster setup.  Where do I drop the 
> jars? Do I need to restart Flink to do so?
> 
> class AppRunner extends CommandLineRunner {
> 
> val log = LoggerFactory.getLogger(classOf[AppRunner])
> 
> 
> override def run(args: String*): Unit = {
> 
> val env : StreamExecutionEnvironment = 
> StreamExecutionEnvironment.getExecutionEnvironment
> 
> val consumer = …
> val producer = ...
> val stream = env.addSource(consumer)
> 
> stream
> …
> // Do some stuff
> …
> .addSink(producer)
> 
> env.execute
> }
> 
> 
> 
> }
> …
> 
> @SpringBootApplication
> object App {
> 
> 
> @throws(classOf[Exception])
> def main( args: Array[String] ) : Unit = {
> SpringApplication.run(classOf[AppRunner], args: _*)
> }
> }
> 
> 
> Try as I might, I couldn’t find any clear instructions on how to do this in 
> the documentation.  The cluster documentation ends with starting it.
> 
> https://ci.apache.org/projects/flink/flink-docs-release-0.8/cluster_setup.html#starting-flink
>  
> 
> 
> The Wikiedits example doesn’t involve any third party dependencies, so I’m 
> not clear on how to manage class path for it.
> 
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/run_example_quickstart.html
>  
> 
> 
> Any help in getting me on the right, preferably best practices path would be 
> appreciated.
> 
> 
> Prez Cannady  
> p: 617 500 3378   
> e: revp...@opencorrelate.org   
> GH: https://github.com/opencorrelate   
> LI: https://www.linkedin.com/in/revprez  
>  
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 



Re: Moving from single-node, Maven examples to cluster execution

2016-06-16 Thread Josh
Hi Prez,

You need to build a jar with all your dependencies bundled inside. With
maven you can use maven-assembly-plugin for this, or with SBT there's
sbt-assembly.

Once you've done this, you can login to the JobManager node of your Flink
cluster, copy the jar across and use the Flink command line tool to submit
jobs to the running cluster, e.g. (from the Flink root directory):

./bin/flink run -c my.application.MainClass /path/to/YourApp.jar

See https://ci.apache.org/projects/flink/flink-docs-master/apis/cli.html

You can also run the Flink command line tool locally and submit the jar to
a remote JobManager with the -m flag. Although I don't do this at the
moment because it doesn't work so easily if you're running Flink on AWS/EMR.

Josh

On Thu, Jun 16, 2016 at 10:51 PM, Prez Cannady 
wrote:

> Having a hard time trying to get my head around how to deploy my Flink
> programs to a pre-configured, remote Flink cluster setup.
>
> My Mavenized setup uses Spring Boot (to simplify class path handling and
> generate pretty logs) to execute provision a StreamExecutionEnvironment
> with Kafka sources and sinks. I can also run this quite effective the
> standard way (`java -jar …`).  What I’m unclear on is how I might go about
> distributing this code to run on an existing Flink cluster setup.  Where do
> I drop the jars? Do I need to restart Flink to do so?
>
> class AppRunner extends CommandLineRunner {
>
> val log = LoggerFactory.getLogger(classOf[AppRunner])
>
>
> override def run(args: String*): Unit = {
>
> val env : StreamExecutionEnvironment =
> StreamExecutionEnvironment.getExecutionEnvironment
>
> val consumer = …
> val producer = ...
> val stream = env.addSource(consumer)
>
> stream
> …
> // Do some stuff
> …
> .addSink(producer)
>
> env.execute
> }
>
>
>
> }
> …
>
> @SpringBootApplication
> object App {
>
>
> @throws(classOf[Exception])
> def main( args: Array[String] ) : Unit = {
> SpringApplication.run(classOf[AppRunner], args: _*)
> }
> }
>
>
> Try as I might, I couldn’t find any clear instructions on how to do this
> in the documentation.  The cluster documentation ends with starting it.
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-0.8/cluster_setup.html#starting-flink
>
> The Wikiedits example doesn’t involve any third party dependencies, so I’m
> not clear on how to manage class path for it.
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/run_example_quickstart.html
>
> Any help in getting me on the right, preferably best practices path would
> be appreciated.
>
>
> Prez Cannady
> p: 617 500 3378
> e: revp...@opencorrelate.org
> GH: https://github.com/opencorrelate
> LI: https://www.linkedin.com/in/revprez
>
>
>
>
>
>
>
>
>
>


Moving from single-node, Maven examples to cluster execution

2016-06-16 Thread Prez Cannady
Having a hard time trying to get my head around how to deploy my Flink programs 
to a pre-configured, remote Flink cluster setup.

My Mavenized setup uses Spring Boot (to simplify class path handling and 
generate pretty logs) to execute provision a StreamExecutionEnvironment with 
Kafka sources and sinks. I can also run this quite effective the standard way 
(`java -jar …`).  What I’m unclear on is how I might go about distributing this 
code to run on an existing Flink cluster setup.  Where do I drop the jars? Do I 
need to restart Flink to do so?

class AppRunner extends CommandLineRunner {

val log = LoggerFactory.getLogger(classOf[AppRunner])


override def run(args: String*): Unit = {

val env : StreamExecutionEnvironment = 
StreamExecutionEnvironment.getExecutionEnvironment

val consumer = …
val producer = ...
val stream = env.addSource(consumer)

stream
…
// Do some stuff
…
.addSink(producer)

env.execute
}



}
…

@SpringBootApplication
object App {


@throws(classOf[Exception])
def main( args: Array[String] ) : Unit = {
SpringApplication.run(classOf[AppRunner], args: _*)
}
}


Try as I might, I couldn’t find any clear instructions on how to do this in the 
documentation.  The cluster documentation ends with starting it.

https://ci.apache.org/projects/flink/flink-docs-release-0.8/cluster_setup.html#starting-flink
 


The Wikiedits example doesn’t involve any third party dependencies, so I’m not 
clear on how to manage class path for it.

https://ci.apache.org/projects/flink/flink-docs-release-1.0/quickstart/run_example_quickstart.html
 


Any help in getting me on the right, preferably best practices path would be 
appreciated.


Prez Cannady  
p: 617 500 3378  
e: revp...@opencorrelate.org   
GH: https://github.com/opencorrelate   
LI: https://www.linkedin.com/in/revprez   











Re: cluster execution

2016-02-01 Thread Lydia Ickler
xD…  a simple "hdfs dfs -chmod -R 777 /users" fixed it!


> Am 01.02.2016 um 12:17 schrieb Till Rohrmann :
> 
> Hi Lydia,
> 
> I looks like that. I guess you should check your hdfs access rights. 
> 
> Cheers,
> Till
> 
> On Mon, Feb 1, 2016 at 11:28 AM, Lydia Ickler  > wrote:
> Hi Till,
> 
> thanks for your reply!
> I tested it with the Wordcount example.
> Everything works fine if I run the command:
> ./flink run -p 3 /home/flink/examples/WordCount.jar
> Then the program gets executed by my 3 workers. 
> If I want to save the output to a file:
> ./flink run -p 3 /home/flink/examples/WordCount.jar 
> hdfs://grips2:9000/users/Flink_1000.csv <> 
> hdfs://grips2:9000/users/Wordcount_1000 <>
> 
> I get the following error message:
> What am I doing wrong? Is something wrong with my cluster writing permissions?
> 
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Cannot initialize task 'DataSink (CsvOutputFormat (path: 
> hdfs://grips2:9000/users/Wordcount_1000s <>, delimiter:  ))': Output 
> directory could not be created.
>   at org.apache.flink.client.program.Client.runBlocking(Client.java:370)
>   at org.apache.flink.client.program.Client.runBlocking(Client.java:348)
>   at org.apache.flink.client.program.Client.runBlocking(Client.java:315)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:78)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
>   at org.apache.flink.client.program.Client.runBlocking(Client.java:252)
>   at 
> org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676)
>   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)
>   at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978)
>   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Cannot 
> initialize task 'DataSink (CsvOutputFormat (path: 
> hdfs://grips2:9000/users/Wordcount_1000s <>, delimiter:  ))': Output 
> directory could not be created.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$6.apply(JobManager.scala:867)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$6.apply(JobManager.scala:851)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at org.apache.flink.runtime.jobmanager.JobManager.org 
> $apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:851)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:341)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>   

Re: cluster execution

2016-02-01 Thread Till Rohrmann
Hi Lydia,

I looks like that. I guess you should check your hdfs access rights.

Cheers,
Till

On Mon, Feb 1, 2016 at 11:28 AM, Lydia Ickler 
wrote:

> Hi Till,
>
> thanks for your reply!
> I tested it with the Wordcount example.
> Everything works fine if I run the command:
> ./flink run -p 3 /home/flink/examples/WordCount.jar
> Then the program gets executed by my 3 workers.
> If I want to save the output to a file:
> ./flink run -p 3 /home/flink/examples/WordCount.jar
> hdfs://grips2:9000/users/Flink_1000.csv
> hdfs://grips2:9000/users/Wordcount_1000
>
> I get the following error message:
> What am I doing wrong? Is something wrong with my cluster writing
> permissions?
>
> org.apache.flink.client.program.ProgramInvocationException: The program
> execution failed: Cannot initialize task 'DataSink (CsvOutputFormat (path:
> hdfs://grips2:9000/users/Wordcount_1000s, delimiter:  ))': Output
> directory could not be created.
> at org.apache.flink.client.program.Client.runBlocking(Client.java:370)
> at org.apache.flink.client.program.Client.runBlocking(Client.java:348)
> at org.apache.flink.client.program.Client.runBlocking(Client.java:315)
> at
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
> at
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:78)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
> at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
> at org.apache.flink.client.program.Client.runBlocking(Client.java:252)
> at
> org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676)
> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)
> at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Cannot
> initialize task 'DataSink (CsvOutputFormat (path:
> hdfs://grips2:9000/users/Wordcount_1000s, delimiter:  ))': Output
> directory could not be created.
> at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$6.apply(JobManager.scala:867)
> at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$6.apply(JobManager.scala:851)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at org.apache.flink.runtime.jobmanager.JobManager.org
> $apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:851)
> at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:341)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
> at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerTh

Re: cluster execution

2016-02-01 Thread Lydia Ickler
Hi Till,

thanks for your reply!
I tested it with the Wordcount example.
Everything works fine if I run the command:
./flink run -p 3 /home/flink/examples/WordCount.jar
Then the program gets executed by my 3 workers. 
If I want to save the output to a file:
./flink run -p 3 /home/flink/examples/WordCount.jar 
hdfs://grips2:9000/users/Flink_1000.csv hdfs://grips2:9000/users/Wordcount_1000 


I get the following error message:
What am I doing wrong? Is something wrong with my cluster writing permissions?

org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Cannot initialize task 'DataSink (CsvOutputFormat (path: 
hdfs://grips2:9000/users/Wordcount_1000s, delimiter:  ))': Output directory 
could not be created.
at org.apache.flink.client.program.Client.runBlocking(Client.java:370)
at org.apache.flink.client.program.Client.runBlocking(Client.java:348)
at org.apache.flink.client.program.Client.runBlocking(Client.java:315)
at 
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
at 
org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
at org.apache.flink.client.program.Client.runBlocking(Client.java:252)
at 
org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Cannot 
initialize task 'DataSink (CsvOutputFormat (path: 
hdfs://grips2:9000/users/Wordcount_1000s, delimiter:  ))': Output directory 
could not be created.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$6.apply(JobManager.scala:867)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$6.apply(JobManager.scala:851)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:851)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:341)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at 
org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at 
org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoin

Re: cluster execution

2016-01-28 Thread Till Rohrmann
Hi Lydia,

what do you mean with master? Usually when you submit a program to the
cluster and don’t specify the parallelism in your program, then it will be
executed with the parallelism.default value as parallelism. You can specify
the value in your cluster configuration flink-config.yaml file.
Alternatively you can always specify the parallelism via the CLI client
with the -p option.

Cheers,
Till
​

On Thu, Jan 28, 2016 at 9:53 AM, Lydia Ickler 
wrote:

> Hi all,
>
> I am doing some operations on a DataSet> …
> (see code below)
> When I run my program on a cluster with 3 machines I can see within the
> web client that only my master is executing the program.
> Do I have to specify somewhere that all machines have to participate?
> Usually the cluster executes in parallel.
>
> Any suggestions?
>
> Best regards,
> Lydia
>
> DataSet> matrixA = readMatrix(env, input);
>
> DataSet> initial = matrixA.groupBy(0).sum(2);
>
> //normalize by maximum value
> initial = initial.cross(initial.max(2)).map(new normalizeByMax());
>
> matrixA.join(initial).where(1).equalTo(0)
>
>   .map(new ProjectJoinResultMapper()).groupBy(0, 1).sum(2);
>
>
>


cluster execution

2016-01-28 Thread Lydia Ickler
Hi all,

I am doing some operations on a DataSet> … (see 
code below)
When I run my program on a cluster with 3 machines I can see within the web 
client that only my master is executing the program. 
Do I have to specify somewhere that all machines have to participate? Usually 
the cluster executes in parallel.

Any suggestions?

Best regards, 
Lydia
DataSet> matrixA = readMatrix(env, input);
DataSet> initial = matrixA.groupBy(0).sum(2);

//normalize by maximum value
initial = initial.cross(initial.max(2)).map(new normalizeByMax());
matrixA.join(initial).where(1).equalTo(0)
  .map(new ProjectJoinResultMapper()).groupBy(0, 1).sum(2);



Re: Cluster Execution - log

2015-07-23 Thread Juan Fumero
Hi Stephan,
  yes, now it is solved. I was running an older version of the client by
mistake. 

Thanks
Juan

On Thu, 2015-07-23 at 15:31 +0200, Stephan Ewen wrote:
> Hi!
> 
> 
> Seems that you have different versions of the code running locally and
> on the cluster. Is it possible that you did a code update locally
> (client), and forgot to update the cluster (or the other way around)?
> 
> 
> Greetings,
> Stephan
> 
> 
> 
> On Thu, Jul 23, 2015 at 3:10 PM, Juan Fumero
>  wrote:
> Hi,
>When I execute from Java the app in a cluster, the user app
> is blocked in this point:
> 
> log4j:WARN No appenders could be found for logger
> (org.apache.flink.api.java.ExecutionEnvironment).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See
> http://logging.apache.org/log4j/1.2/faq.html#noconfig for more
> info.
> 
> 
> I get this message in the jobManager log:
> 
> 15:03:53,226 WARN  akka.remote.ReliableDeliverySupervisor
> - Association with remote system
> [akka.tcp://flink@127.0.0.1:60400] has failed, address is now
> gated for [5000] ms. Reason is: [Disassociated].
> 15:03:55,368 WARN  akka.remote.ReliableDeliverySupervisor
> - Association with remote system
> [akka.tcp://flink@127.0.0.1:53211] has failed, address is now
> gated for [5000] ms. Reason is:
> [org.apache.flink.api.common.ExecutionConfig
> $GlobalJobParameters; local class incompatible: stream
> classdesc serialVersionUID = 5926998565449078053, local class
> serialVersionUID = 1].
> 
> The user function is just a class which implements
> MapFunction,  Any idea? 
> 
> Many thanks
> Juan
> 
> 




Re: Cluster Execution - log

2015-07-23 Thread Stephan Ewen
Hi!

Seems that you have different versions of the code running locally and on
the cluster. Is it possible that you did a code update locally (client),
and forgot to update the cluster (or the other way around)?

Greetings,
Stephan


On Thu, Jul 23, 2015 at 3:10 PM, Juan Fumero <
juan.jose.fumero.alfo...@oracle.com> wrote:

>  Hi,
>When I execute from Java the app in a cluster, the user app is blocked
> in this point:
>
> log4j:WARN No appenders could be found for logger
> (org.apache.flink.api.java.ExecutionEnvironment).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
>
>
> I get this message in the jobManager log:
>
> 15:03:53,226 WARN
> akka.remote.ReliableDeliverySupervisor- Association
> with remote system [akka.tcp://flink@127.0.0.1:60400] has failed, address
> is now gated for [5000] ms. Reason is: [Disassociated].
> 15:03:55,368 WARN
> akka.remote.ReliableDeliverySupervisor- Association
> with remote system [akka.tcp://flink@127.0.0.1:53211] has failed, address
> is now gated for [5000] ms. Reason is:
> [org.apache.flink.api.common.ExecutionConfig$GlobalJobParameters; local
> class incompatible: stream classdesc serialVersionUID =
> 5926998565449078053, local class serialVersionUID = 1].
>
> The user function is just a class which implements MapFunction,  Any idea?
>
> Many thanks
> Juan
>


Cluster Execution - log

2015-07-23 Thread Juan Fumero
Hi,
   When I execute from Java the app in a cluster, the user app is
blocked in this point:

log4j:WARN No appenders could be found for logger
(org.apache.flink.api.java.ExecutionEnvironment).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.


I get this message in the jobManager log:

15:03:53,226 WARN  akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@127.0.0.1:60400] has
failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:03:55,368 WARN  akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@127.0.0.1:53211] has
failed, address is now gated for [5000] ms. Reason is:
[org.apache.flink.api.common.ExecutionConfig$GlobalJobParameters; local
class incompatible: stream classdesc serialVersionUID =
5926998565449078053, local class serialVersionUID = 1].

The user function is just a class which implements MapFunction,  Any
idea? 

Many thanks
Juan


Re: Cluster execution -jar files-

2015-07-16 Thread Matthias J. Sax
As the JavaDoc explains:

>* @param jarFiles The JAR files with code that needs to be shipped to 
> the cluster. If the program uses
>* user-defined functions, user-defined input formats, 
> or any libraries, those must be
>* provided in the JAR files.

-> external libraries, yes
-> your program code, no
  -> except your UDFs, those yes

-Matthias


On 07/16/2015 04:06 PM, Juan Fumero wrote:
> Missing reference:
> 
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/cluster_execution.html
> 
> On Don, 2015-07-16 at 16:04 +0200, Juan Fumero wrote:
>> Hi, 
>>   I would like to use the createRemoteEnvironment to run the application
>> in a cluster and I have some questions. Following the documentation in
>> [1] It is not clear to me how to use it.
>>
>> What should be the content of the jar file? All the external libraries
>> that I use? or need to include the program map/reduce to be distributed
>> as well? In the last case, why should I redefine all the operations
>> again in the main source? Shouldn't be included in the jar files? 
>>
>> Many thanks
>> Juan
>>
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: Cluster execution -jar files-

2015-07-16 Thread Juan Fumero
Missing reference:

[1]
https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/cluster_execution.html

On Don, 2015-07-16 at 16:04 +0200, Juan Fumero wrote:
> Hi, 
>   I would like to use the createRemoteEnvironment to run the application
> in a cluster and I have some questions. Following the documentation in
> [1] It is not clear to me how to use it.
> 
> What should be the content of the jar file? All the external libraries
> that I use? or need to include the program map/reduce to be distributed
> as well? In the last case, why should I redefine all the operations
> again in the main source? Shouldn't be included in the jar files? 
> 
> Many thanks
> Juan
> 




Cluster execution -jar files-

2015-07-16 Thread Juan Fumero
Hi, 
  I would like to use the createRemoteEnvironment to run the application
in a cluster and I have some questions. Following the documentation in
[1] It is not clear to me how to use it.

What should be the content of the jar file? All the external libraries
that I use? or need to include the program map/reduce to be distributed
as well? In the last case, why should I redefine all the operations
again in the main source? Shouldn't be included in the jar files? 

Many thanks
Juan