Re: Apache Flink Operator State as Query Cache

2015-11-08 Thread Welly Tambunan
Thanks for the answer.

Currently the approach that i'm using right now is creating a base/marker
interface to stream different type of message to the same operator. Not
sure about the performance hit about this compare to the CoFlatMap
function.

Basically this one is providing query cache, so i'm thinking instead of
using in memory cache like redis, ignite etc, i can just use operator state
for this one.

I just want to gauge do i need to use memory cache or operator state would
be just fine.

However i'm concern about the Gen 2 Garbage Collection for caching our own
state without using operator state. Is there any clarification on that one
?



On Sat, Nov 7, 2015 at 12:38 AM, Anwar Rizal  wrote:

>
> Let me understand your case better here. You have a stream of model and
> stream of data. To process the data, you will need a way to access your
> model from the subsequent stream operations (map, filter, flatmap, ..).
> I'm not sure in which case Operator State is a good choice, but I think
> you can also live without.
>
> val modelStream =  // get the model stream
> val dataStream   =
>
> modelStream.broadcast.connect(dataStream). coFlatMap(  ) Then you can keep
> the latest model in a CoFlatMapRichFunction, not necessarily as Operator
> State, although maybe OperatorState is a good choice too.
>
> Does it make sense to you ?
>
> Anwar
>
> On Fri, Nov 6, 2015 at 10:21 AM, Welly Tambunan  wrote:
>
>> Hi All,
>>
>> We have a high density data that required a downsample. However this
>> downsample model is very flexible based on the client device and user
>> interaction. So it will be wasteful to precompute and store to db.
>>
>> So we want to use Apache Flink to do downsampling and cache the result
>> for subsequent query.
>>
>> We are considering using Flink Operator state for that one.
>>
>> Is that the right approach to use that for memory cache ? Or if that
>> preferable using memory cache like redis etc.
>>
>> Any comments will be appreciated.
>>
>>
>> Cheers
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com 
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com 


Re: Flink execution time benchmark.

2015-11-08 Thread rmetzger0
Hi Saleh,
I'm sorry that nobody replied to your message. I think you were not
subscribed to the user@flink.apache.org. mailing list when you posted this
question. Therefore, we Flink community didn't receive your message.

Did you resolve the issue in the meantime or are you still seeking for help?



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-execution-time-benchmark-tp3258p3397.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Creating a Source using an Akka actor

2015-11-08 Thread rmetzger0
Hi Hector,

I'm sorry that nobody replied to your message so far. I think you are not
subscribed to the user@flink.apache.org mailing list when you posted this
question. Therefore, we Flink community didn't receive your message. To
subscribe, send an email to user-subscr...@flink.apache.org.

Regarding your question: There is right now no way of receiving Akka
messages into a DataStream, so you have to implement an actor system
yourself using the "SourceFunction" interface.

I'm not aware of any Apache Camel endpoints, or any plans to implement
those. However, contributions are always welcome ;)

Regards,
Robert



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Creating-a-Source-using-an-Akka-actor-tp2956p3398.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


finite subset of an infinite data stream

2015-11-08 Thread rss rss
Hello,



I need to extract a finite subset like a data buffer from an infinite data
stream. The best way for me is to obtain a finite stream with data
accumulated for a 1minute before (as example). But I not found any existing
technique to do it.



As a possible ways how to do something near to a stream’s subset I see
following cases:

-  some transformation operation like ‘take_while’ that produces
new stream but able to switch one to FINNISHED state. Unfortunately I not
found how to switch the state of a stream from a user code of
transformation functions;

-  new DataStream or StreamSource constructors which allow to
connect a data processing chain to the source stream. It may be something
like mentioned take_while transform function or modified StreamSource.run
method with data from the source stream.



That is I have two questions.

1)  Is there any technique to extract accumulated data from a stream as
a stream (to union it with another stream)? This is like pure buffer mode.

2)  If the answer to first question is negative, is there something
like take_while transformation or should I think about custom
implementation of it? Is it possible to implement it without modification
of the core of Flink?



Regards,

Roman


Re: Flink on EC"

2015-11-08 Thread Thomas Götzinger
Sorry for Confusing,

the flink cluster throws following stack trace..

org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Failed to submit job 29a2a25d49aa0706588ccdb8b7e81c6c (Flink 
Java Job at Sun Nov 08 18:50:52 UTC 2015)
at org.apache.flink.client.program.Client.run(Client.java:413)
at org.apache.flink.client.program.Client.run(Client.java:356)
at org.apache.flink.client.program.Client.run(Client.java:349)
at 
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63)
at 
org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:789)
at de.fraunhofer.iese.proopt.Template.run(Template.java:112)
at de.fraunhofer.iese.proopt.Main.main(Main.java:8)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
at org.apache.flink.client.program.Client.run(Client.java:315)
at 
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:582)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:288)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:878)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:920)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Failed to 
submit job 29a2a25d49aa0706588ccdb8b7e81c6c (Flink Java Job at Sun Nov 08 
18:50:52 UTC 2015)
at 
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:594)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:190)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.JobException: Creating the input splits 
caused an error: No file system found with scheme s3n, referenced in file URI 
's3n://big-data-benchmark/pavlo/text/tiny/rankings'.
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(ExecutionJobVertex.java:162)
at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:469)
at 
org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:534)
... 19 more
Caused by: java.io.IOException: No file system found with scheme s3n, 
referenced in file URI 's3n://big-data-benchmark/pavlo/text/tiny/rankings'.
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:247)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:309)
at 
org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:447)
at 
org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:57)
at 
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(ExecutionJobVertex.java:146)
... 21 more

-- 
Viele Grüße

 
Thomas Götzinger

Freiberuflicher Informatiker

 
Glockenstraße 2a

D-66882 Hütschenhausen OT Spesbach

Mobil: +49 (0)176 

Re: Flink on EC"

2015-11-08 Thread Thomas Götzinger
HI Fabian,

thanks for reply. I use a karamel receipt to install flink on ec2.Currently I 
am using flink-0.9.1-bin-hadoop24.tgz 
.

 In that file the NativeS3FileSystem is included. First I’ve tried it with the 
standard karamel receipt on github hopshadoop/flink-chef 
 but it’s on Version 0.9.0 and the 
S3NFileSystem is not included.
So I forked the github project by goetzingert/flink-chef
Although the class file is include the application throws a 
ClassNotFoundException for the class above.
In my Project I add the conf/core-site.xml

  
fs.s3n.impl
org.apache.hadoop.fs.s3native.NativeS3FileSystem
  
  
fs.s3n.awsAccessKeyId
….
  
  
fs.s3n.awsSecretAccessKey
...
  

— 
I also tried to use the programmatic configuration 

XMLConfiguration config = new XMLConfiguration(configPath);

env = ExecutionEnvironment.getExecutionEnvironment();
Configuration configuration = 
GlobalConfiguration.getConfiguration();
configuration.setString("fs.s3.impl", 
"org.apache.hadoop.fs.s3native.NativeS3FileSystem");
configuration.setString("fs.s3n.awsAccessKeyId", “..");
configuration.setString("fs.s3n.awsSecretAccessKey”,”../");

configuration.setString("fs.hdfs.hdfssite",Template.class.getResource("/conf/core-site.xml").toString());
GlobalConfiguration.includeConfiguration(configuration);


Any Idea why the class is not included in classpath? Is there another script to 
setup flink on ec2 cluster?

When will flink 0.10 be released? 



Regards

 
Thomas Götzinger

Freiberuflicher Informatiker

 
Glockenstraße 2a

D-66882 Hütschenhausen OT Spesbach

Mobil: +49 (0)176 82180714

Privat: +49 (0) 6371 954050

mailto:m...@simplydevelop.de 
epost: thomas.goetzin...@epost.de 




> On 29.10.2015, at 09:47, Fabian Hueske  wrote:
> 
> Hi Thomas,
> 
> until recently, Flink provided an own implementation of a S3FileSystem which 
> wasn't fully tested and buggy.
> We removed that implementation and are using now (in 0.10-SNAPSHOT) Hadoop's 
> S3 implementation by default.
> 
> If you want to continue using 0.9.1 you can configure Flink to use Hadoop's 
> implementation. See this answer on StackOverflow and the linked email thread 
> [1].
> If you switch to the 0.10-SNAPSHOT version (which will be released in a few 
> days as 0.10.0), things become a bit easier and Hadoop's implementation is 
> used by default. The documentation shows how to configure your access keys [2]
> 
> Please don't hesitate to ask if something is unclear or not working.
> 
> Best, Fabian
> 
> [1] 
> http://stackoverflow.com/questions/32959790/run-apache-flink-with-amazon-s3 
> 
> [2] 
> https://ci.apache.org/projects/flink/flink-docs-master/apis/example_connectors.html
>  
> 
> 
> 2015-10-29 9:35 GMT+01:00 Thomas Götzinger  >:
> Hello Flink Team,
> 
> We at IESE Fraunhofer are evaluating Flink for a project and I'm a bit 
> frustrated in the moment. 
> 
> I've wrote a few testcases with the flink API and want to deploy them to an 
> Flink EC2 Cluster. I setup the cluster using the 
> karamel receipt which was adressed in the following video 
> 
> https://www.google.de/url?sa=t=j==s=video=1=rja=8=0CDIQtwIwAGoVChMIy86Tq6rQyAIVR70UCh0IRwuJ=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dm_SkhyMV0to=AFQjCNGKUzFv521yg-OTy-1XqS2-rbZKug=bv.105454873,d.bGg
>  
> 
> 
> The setup works fine and the hello-flink app could be run. But afterwards I 
> want to copy some data from s3 bucket to the local ec2 hdfs cluster. 
> 
> The hadoop fs -ls s3n works as well as cat,...
> But if I want to copy the data with distcp the command freezes, and does not 
> respond until a timeout.
> 
> After trying a few things I gave up and start another solution. I want to 
> access the s3 Bucket directly with flink and import it using a small flink 
> programm which just reads s3 and writes to local hadoop. This works fine 
> locally, but on cluster the S3NFileSystem class is missing (ClassNotFound 
> Exception) althoug it is included in the jar file of the installation. 
> 
> 
> I forked the chef receipt and updated to flink 0.9.1 but the same issue.
> 
> Is there another simple script to install flink with hadoop on an ec2 cluster 
> and working s3n filesystem?
> 
> 
> 
> 
> Freelancer 
> 
> on Behalf of Fraunhofer IESE