Re: Rapidly failing job eventually causes "Not enough free slots"

2017-01-20 Thread Shannon Carey
In fact, I can see all my job jar blobs and some checkpoint & job graph files in my configured "recovery.zookeeper.storageDir"… however for some reason it didn't get restored when my new Flink cluster started up. From: Shannon Carey mailto:sca...@expedia.com>> Date: Friday, January 20, 2017 at

Re: Rapidly failing job eventually causes "Not enough free slots"

2017-01-20 Thread Shannon Carey
I recently added some better visibility into the metrics we're gathering from Flink. My Flink cluster died again due to the "Not enough free slots available to run the job" problem, and this time I can see that the number of registered task managers went down from 11 to 7, then waffled and only

Re: Kryo Deserializer

2017-01-20 Thread 小多
Hi Biswajit, You can follow this is: https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/best_practices.html#register-a-custom-serializer-for-your-flink-program Best regards, Duo On Sat, Jan 21, 2017 at 9:15 AM, Biswajit Das wrote: > Hello, > > Having an issue with nested protob

Kryo Deserializer

2017-01-20 Thread Biswajit Das
Hello, Having an issue with nested protobuf deserialization, event tried with register the class with Kryo like beloe but seems like no help , one of the options left for me is to write a custom serializer or convert the byte array to a Dictionary object . *val clazz = Class.forName("java.util.

Re: Rolling sink parquet/Avro output

2017-01-20 Thread Biswajit Das
Thank for the mail Bruno !! On Wed, Jan 18, 2017 at 1:10 AM, Bruno Aranda wrote: > Sorry, something went wrong with the code for the Writer. Here it is again: > > import org.apache.avro.Schema > import org.apache.flink.streaming.connectors.fs.Writer > import org.apache.hadoop.fs.{FileSystem, Pat

Re: Apache Flink 1.1.4 - Gelly - LocalClusteringCoefficient - Returning values above 1?

2017-01-20 Thread Greg Hogan
Hi Miguel, The '--output print' option describes the values and also displays the local clustering coefficient value. You're running the undirected algorithm on a directed graph. In 1.2 there is an option '--simplify true' that will add reverse edges and remove duplicate edges and self-loops. Alt

Re: Apache Flink 1.1.4 - Gelly - LocalClusteringCoefficient - Returning values above 1?

2017-01-20 Thread Vasiliki Kalavri
Hi Miguel, the LocalClusteringCoefficient algorithm returns a DataSet of type Result, which basically wraps a vertex id, its degree, and the number of triangles containing this vertex. The number 11 you see is indeed the degree of vertex 5113. The Result type contains the method getLocalClustering

Apache Flink 1.1.4 - Gelly - LocalClusteringCoefficient - Returning values above 1?

2017-01-20 Thread Miguel Coimbra
Hello, In the documentation of the LocalClusteringCoefficient algorithm, it is said: *The local clustering coefficient measures the connectedness of each vertex’s neighborhood.Scores range from 0.0 (no edges between neighbors) to 1.0 (neighborhood is a clique).* https://ci.apache.org/projects/f

Re: Re: NPE in JobManager

2017-01-20 Thread Dave Marion
Fixing my accumulator did the trick. I should note that the JobManager did not fail when I ran this previously against Flink 1.1.3. Thanks for the help! Dave > On January 20, 2017 at 8:45 AM Dave Marion wrote: > > I do see that message in one of the task manager logs 20ms before the NPE

Re: Rate-limit processing

2017-01-20 Thread Till Rohrmann
Hi Florian, any blocking of the user code thread is in general a not so good idea because the checkpointing happens under the very same lock which also guards the user code invocation. Thus any checkpoint barrier arriving at the operator has only the chance to trigger the checkpointing once the bl

Re: Cluster failure after zookeeper glitch.

2017-01-20 Thread Till Rohrmann
Hi Andrew, if the ZooKeeper cluster fails and Flink is not able to connect to a functioning quorum again, then it will basically stop working because the JobManagers are no longer able to elect a leader among them. The lost leadership of the JobManager can be seen in the logs (=> expected leader s

Re: Re: NPE in JobManager

2017-01-20 Thread Dave Marion
I do see that message in one of the task manager logs 20ms before the NPE in the JobManager. Looking in that log, there is a ConcurrentModificationException in TreeMap, which my accumulator uses. I'll track this down, thanks for the pointer. > On January 20, 2017 at 8:27 AM Stephan Ewen wrote

Re: Re: NPE in JobManager

2017-01-20 Thread Stephan Ewen
Hi! My current assumption is that there is an accumulator that cannot be serialized. The SortedStringAccumulator looks fine at a first glance, but are there other accumulators involved? Do you see a message like that one in the log of one of the TaskManagers "Failed to serialize accumulators for

Re: Re: NPE in JobManager

2017-01-20 Thread Dave Marion
Stephan, Thanks for looking at this. Could you elaborate on the misbehavior in the accumulator? I'd like to fix it if it's incorrect. Dave > On January 20, 2017 at 4:29 AM Stephan Ewen wrote: > > Hi! > > It seems that the accumulator behaves in a non-standard way, but the > JobMana

Re: Streaming file source?

2017-01-20 Thread Niels Basjes
Thanks! This sounds really close to what I had in mind. I'll use this first and see how far I get. Niels On Fri, Jan 20, 2017 at 11:27 AM, Stephan Ewen wrote: > Hi Niels! > > There is the Continuous File Monitoring Source, used via > > StreamExecutionEnvironment.readFile(FileInputFormat > inpu

Re: keyBy called twice. Second time, INetAddress and Array[Byte] are empty

2017-01-20 Thread Jonas
Hey Jamie, It turns out you were right :) I wrote my own implementation of IPAddress and then it worked. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/keyBy-called-twice-Second-time-INetAddress-and-Array-Byte-are-empty-tp10907p11179.html S

Re: Rate-limit processing

2017-01-20 Thread Yassine MARZOUGUI
Hi, You might find this similar thread from the mailing list archive helpful : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/throttled-stream-td6138.html . Best, Yassine 2017-01-20 10:53 GMT+01:00 Florian König : > Hi, > > i need to limit the rate of processing in a Flink

Re: Streaming file source?

2017-01-20 Thread Stephan Ewen
Hi Niels! There is the Continuous File Monitoring Source, used via StreamExecutionEnvironment.readFile(FileInputFormat inputFormat, String filePath, FileProcessingMode watchType, long interval); This can be used to both continuously ingest from files, or to read files once. Kostas can probably

Streaming file source?

2017-01-20 Thread Niels Basjes
Hi, For testing and optimizing a streaming application I want to have a "100% accurate repeatable" substitute for a Kafka source. I was thinking of creating a streaming source class that simply reads the records from a (static unchanging) set of files. Each file would then produce the data which (

Re: Re: NPE in JobManager

2017-01-20 Thread Stephan Ewen
I opened this issue: https://issues.apache.org/jira/browse/FLINK-5585 Assuming the bug is what I think it is (cannot be 100% sure from just the small stack trace sample) it should be fixed soon... On Fri, Jan 20, 2017 at 10:29 AM, Stephan Ewen wrote: > Hi! > > It seems that the accumulator beha

Rate-limit processing

2017-01-20 Thread Florian König
Hi, i need to limit the rate of processing in a Flink stream application. Specifically, the number of items processed in a .map() operation has to stay under a certain maximum per second. At the moment, I have another .map() operation before the actual processing, which just sleeps for a certa

Re: Cluster failure after zookeeper glitch.

2017-01-20 Thread Stefan Richter
I would think that network problems between Flink and Zookeeper in HA mode could indeed lead to problems. Maybe Till (in CC) has a better idea of what is going on there). > Am 19.01.2017 um 14:55 schrieb Andrew Ge Wu : > > Hi Stefan > > Yes we are running in HA mode with dedicated zookeeper cl

Re: Re: NPE in JobManager

2017-01-20 Thread Stephan Ewen
Hi! It seems that the accumulator behaves in a non-standard way, but the JobManager should also catch that (log a warning or debug message) and simply continue (not crash). I'll try to add a patch that the JobManager tolerates these kinds of issues in the accumulators. Stephan On Thu, Jan 19,