Re: checkpoint notifier not found?

2016-12-09 Thread Abhishek R. Singh
I was following the official documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/state.html Looks like this is the right one to be using: import

checkpoint notifier not found?

2016-12-09 Thread Abhishek R. Singh
I can’t seem to find CheckpointNotifier. Appreciate help ! CheckpointNotifier is not a member of package org.apache.flink.streaming.api.checkpoint From my pom.xml: org.apache.flink flink-scala_2.11 1.1.3

Testing Flink Streaming applications - controlling the clock

2016-12-09 Thread Rohit Agarwal
Hi, I am writing tests for my flink streaming application. I mostly use event-time. But there are some aspects which are still controlled by wall-clock time. For example, I am using AssignerWithPeriodicWatermarks and so watermarks are triggered based on wall-clock time. Similarly, checkpoints are

Re: Reg. custom sinks in Flink

2016-12-09 Thread Meghashyam Sandeep V
Thanks a lot for the quick reply Shannon. 1. I will create a class that extends SinkFunction and write my connection logic there. My only question here is- will a dbSession be created for each message/partition which might affect the performance? Thats the reason why I added this line to create a

Re: Reg. custom sinks in Flink

2016-12-09 Thread Shannon Carey
You haven't really added a sink in Flink terminology, you're just performing a side effect within a map operator. So while it may work, if you want to add a sink proper you need have an object that extends SinkFunction or RichSinkFunction. The method call on the stream should be ".addSink(…)".

How to retrieve values from yarn.taskmanager.env in a Job?

2016-12-09 Thread Shannon Carey
This thread http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/passing-environment-variables-to-flink-program-td3337.html describes the impetus for the addition of yarn.taskmanager.env. I have configured a value within yarn.taskmanager.env, and I see it appearing in the Flink

Reg. custom sinks in Flink

2016-12-09 Thread Meghashyam Sandeep V
Hi there, I have a flink streaming app where my source is Kafka and a custom sink to Cassandra(I can't use standard C* sink that comes with flink as I have customized auth to C*). I'm currently have the following: messageStream .rebalance() .map( s-> { return

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-09 Thread Greg Hogan
Google indexes the mailing list. Anyone can filter the messages to trash in a few clicks. This will also be a means for the community to better understand which and how companies are using Flink. On Fri, Dec 9, 2016 at 8:27 AM, Felix Neutatz wrote: > Hi, > > I wonder

Re: Thread 'SortMerger spilling thread' terminated due to an exception: No space left on device

2016-12-09 Thread Fabian Hueske
It looks like the result you are trying to fetch with collect() is too large. collect() does only work for result up to 10MB. I would write the result to a file and read that file in. Best, Fabian 2016-12-09 16:30 GMT+01:00 Miguel Coimbra : > Hello Fabian, > > So if

Re: How to analyze space usage of Flink algorithms

2016-12-09 Thread Greg Hogan
This does sound like a nice feature, both per-job and per-taskmanager bytes written to and read from disk. On Fri, Dec 9, 2016 at 8:51 AM, Chesnay Schepler wrote: > We do not measure how much data we are spilling to disk. > > > On 09.12.2016 14:43, Fabian Hueske wrote: > >

Re: Thread 'SortMerger spilling thread' terminated due to an exception: No space left on device

2016-12-09 Thread Miguel Coimbra
Hello Fabian, So if I want to have 10 nodes with one working thread each, I would just set this, I assume: taskmanager.numberOfTaskSlots: 1 parallelism.default: 10 There is progress, albeit little. I am now running on a directory with more space. For 10 iterations of label propagation, I am

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-09 Thread Timur Shenkao
Hi there, 1) it's a perfect idea 2) as I understand such information can be placed neither on* flink.apache.org * nor on* data-artisans.com * 3) there are tons of sites; who would take care of *dedicated site? *how not to forget about its

Re: How to analyze space usage of Flink algorithms

2016-12-09 Thread Chesnay Schepler
We do not measure how much data we are spilling to disk. On 09.12.2016 14:43, Fabian Hueske wrote: Hi, the heap mem usage should be available via Flink's metrics system. Not sure if that also captures spilled data. Chesnay (in CC) should know that. If the spilled data is not available as a

Re: How to analyze space usage of Flink algorithms

2016-12-09 Thread Fabian Hueske
Hi, the heap mem usage should be available via Flink's metrics system. Not sure if that also captures spilled data. Chesnay (in CC) should know that. If the spilled data is not available as a metric, you can try to write a small script that monitors the directories to which Flink spills (Config

Re: Flink survey by data Artisans

2016-12-09 Thread Mike Winters
Hi everyone, A quick heads-up that we'll be closing the Flink user survey to new responses this coming Monday 12 Dec around 9am EST. If you'd still like to respond before Monday, you can do so here: http://www.surveygizmo.com/s3/3166399/181bdb611f22. We've seen more than 100 responses so far.

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-09 Thread Felix Neutatz
Hi, I wonder whether a mailing list is a good choice for that in general. If I am looking for a job I won't register for a mailing list or browse through the archive of one but rather search it via Google. So what about putting it on a dedicated site on the Web Page. This feels more intuitive to

How to analyze space usage of Flink algorithms

2016-12-09 Thread otherwise777
Currently i'm doing some analysis for some algorithms that i use in Flink, I'm interested in the Space and time it takes to execute them. For the Time i used getNetRuntime() in the executionenvironment, but I have no idea how to analyse the amount of space an algorithm uses. Space can mean

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-09 Thread Ufuk Celebi
On 9 December 2016 at 14:13:14, Robert Metzger (rmetz...@apache.org) wrote: > I'm against using the news@ list for that. > The promise of the news@ list is that its low-traffic and only for news. If > we now start having job offers (and potentially some questions on them > etc.) it'll be a list

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-09 Thread Robert Metzger
I'm against using the news@ list for that. The promise of the news@ list is that its low-traffic and only for news. If we now start having job offers (and potentially some questions on them etc.) it'll be a list with more than some announcements. That's also the reason why the news@ list is

RE: OutOfMemory when looping on dataset filter

2016-12-09 Thread LINZ, Arnaud
Hi, It works with a local cluster, I effectively use a yarn cluster here. Pushing user code to the lib folder of every datanode is not convenient ; it’s hard to maintain & exploit. If I cannot make the treatment serializable to put everything in a group reduce function, I think I’ll try

Re: separation of JVMs for different applications

2016-12-09 Thread Till Rohrmann
Hi Manu, afaik there is no JIRA for standalone v2.0 yet. So feel free to open an JIRA for it. Just a small correction, FLIP-6 is not almost finished yet. But we're working on it and are happy for every helping hand :-) Cheers, Till On Fri, Dec 9, 2016 at 2:27 AM, Manu Zhang

Re: OutOfMemory when looping on dataset filter

2016-12-09 Thread Stephan Ewen
Hi Arnaud! I assume you are using either a standalone setup, or a YARN session? This looks to me as if classes cannot be properly garbage collected. Since each job (each day is executed as a separate job), loads the classes again, the PermGen space runs over if classes are not properly

Re: conditional dataset output

2016-12-09 Thread lars . bachmann
Hi Chesnay, I actually thought about the same but like you said it seems a bit hacky ;-). Anyway thank you! Regards, Lars Am 08.12.2016 16:47 schrieb Chesnay Schepler: Hello Lars, The only other way i can think of how this could be done is by wrapping the used outputformat in a custom

RE: OutOfMemory when looping on dataset filter

2016-12-09 Thread LINZ, Arnaud
Hi, Caching could have been a solution. Another one is using a “group reduce” by day, but for that I need to make the “applyComplexNonDistributedTreatment” serializable, and that’s not an easy task. 1 & 2 - The OOM in my current test occurs in the 8th iteration (7 were successful). In this

Re: OutOfMemory when looping on dataset filter

2016-12-09 Thread Fabian Hueske
Hi Arnaud, Flink does not cache data at the moment. What happens is that for every day, the complete program is executed, i.e., also the program that computes wholeSet. Each execution should be independent from each other and all temporary data be cleaned up. Since Flink executes programs in a

OutOfMemory when looping on dataset filter

2016-12-09 Thread LINZ, Arnaud
Hello, I have a non-distributed treatment to apply to a DataSet of timed events, one day after another in a flink batch. My algorithm is: // wholeSet is too big to fit in RAM with a collect(), so we cut it in pieces DataSet wholeSet = [Select WholeSet]; for (day 1 to 31) {

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-09 Thread Kostas Tzoumas
I appreciate the concern Kanstantsin! We do have a news@ mailing list, but it has been under-utilized so far. Perhaps revamping that one would do it? My only concern is that subscribing to a new mailing list is an overhead. As a temp solution, we could cc the dev and user list in the first few