Re: weird client failure/timeout

2017-01-23 Thread Abhishek Singh
Hi Stephan, This did not work. For the working case I do see a better utilization of available slots. However the non working case still doesn't work. Basically I assigned a unique group to the sources in my for loop - given I have way more slots than the parallelism I seek. I know about the

multi tenant workflow execution

2017-01-23 Thread Chen Qin
Hi there, I am researching running one flink job to support customized event driven workflow executions. The use case is to support running various workflows that listen to a set of kafka topics and performing various rpc checks, a user travel through multiple stages in a rule execution(workflow

Re: Expected behaviour of windows

2017-01-23 Thread Abdul Salam Shaikh
It was an exception because I had missed the clear() function within my CustomTrigger. It works as expected now. Thanks for all the help :) On Tue, Jan 24, 2017 at 12:23 AM, Abdul Salam Shaikh < abd.salam.sha...@gmail.com> wrote: > This is my definiton of the trigger for more clarity into the

Re: Expected behaviour of windows

2017-01-23 Thread Abdul Salam Shaikh
This is my definiton of the trigger for more clarity into the issue I am running: @Override public TriggerResult onElement(FlatObject t, long l, Window w, TriggerContext tc) throws Exception { long currentTimeInCycle = t.getCurrentTimeInCycle(); if (lastKnownCurrentTimeInCycle

TestStreamEnvironment: await last flush of processing time-based windows

2017-01-23 Thread Steven Ruppert
Hi, I'm attempting to unit test link with the flink-test-utils support, on flink 1.1.4. I've got basic flatMap stuff flowing through just fine, but when running any processing time-based windowing functions, `env.execute()` will return before any values are flushed out of the windows. import

Re: Expected behaviour of windows

2017-01-23 Thread Abdul Salam Shaikh
Thank you Jonas, I am using version *1.2-SNAPSHOT* of Apache Flink to leverage the advanced Evictor class. However, while trying to use FIRE_AND_PURGE I am getting the following error: java.lang.UnsupportedOperationException: Not supported yet. at

Re: weird client failure/timeout

2017-01-23 Thread Stephan Ewen
Hi! I think what you are seeing is the effect of too mans tasks going to the same task slot. Have a look here: https://ci.apache.org/projects/flink/flink-docs-release-1.2/concepts/runtime.html#task-slots-and-resources By default, Flink shares task slots across all distinct pipelines of the same

Re: Better way to read several stream sources

2017-01-23 Thread Stephan Ewen
Why don't you define one source and make it parallel? You can implement "RichParallelSourceFunction" and use that to check which parallel subtask the respective source is during execution. On Mon, Jan 23, 2017 at 6:55 PM, Sendoh wrote: > Hi Flink users, > > Can I ask

Re: Is there any difference between Checkpointed and CheckpointedAsynchronously?

2017-01-23 Thread Stephan Ewen
Hi! "CheckpointedAsynchronously" was a hint to the system that the state objects could be persisted asynchronously, i.e., that the stateful function could process data concurrently to the persisting operation. However, this path was never implemented, so the hint was never used. Flink 1.2 has

Re: Release 1.2?

2017-01-23 Thread Stephan Ewen
Hi Denis! For windows, it should not matter if you pick a Hadoop 1 or Hadoop 2 flavor. Stephan On Mon, Jan 23, 2017 at 5:36 PM, wrote: > Ok, thanks for the link. > > > > This makes sense. > > > > I don’t know why I had in mind that running Flink on Windows

Re: Improving Flink performance

2017-01-23 Thread Ted Yu
After "My job looks like this:", it was empty. Please consider using third party site for images. Cheers On Mon, Jan 23, 2017 at 10:03 AM, Jonas wrote: > I received it well-formatted. May it be that the issue is your Mail reader? > > > > -- > View this message in context:

Re: Improving Flink performance

2017-01-23 Thread Jonas
I received it well-formatted. May it be that the issue is your Mail reader? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-performance-tp11211p11225.html Sent from the Apache Flink User Mailing List archive. mailing list

Better way to read several stream sources

2017-01-23 Thread Sendoh
Hi Flink users, Can I ask is what would be the better way to read multiple stream sources? I have a FooSource which implements SourceFunction and reads one source, and would like to read several FooSource. FooSource basically reads data as stream by http call. Option1: Use a for-loop to read

Custom Partitioning and windowing questions/concerns

2017-01-23 Thread Katsipoulakis, Nikolaos Romanos
Hello all, Currently, I examine the effects of stream partitioning on performance for simple state-full scenarios. My toy application for the rest of my question will be the following: A stream of non-negative integers, each one annotated with a timestamp, and the goal is to get the top-10

Re: weird client failure/timeout

2017-01-23 Thread Abhishek R. Singh
Actually, I take it back. It is the last union that is causing issues (of job being un-submittable). If I don’t conbineAtEnd, I can go higher (at least deploy the job), all the way up to 63. After that it starts failing in too many files open in Rocks DB (which I can understand and is at

RE: Release 1.2?

2017-01-23 Thread denis.dollfus
Ok, thanks for the link. This makes sense. I don’t know why I had in mind that running Flink on Windows implied using the hadoop1 flavor of Flink – that’s very likely unrelated, please correct me if I’m wrong. Regards, Denis From: Greg Hogan [mailto:c...@greghogan.com] Sent: lundi 23 janvier

Re: weird client failure/timeout

2017-01-23 Thread Abhishek R. Singh
Is there a limit on how many DataStreams can be defined in a streaming program? Looks like flink has problems handling too many data streams? I simplified my topology further. For eg, this works (parallelism of 4) However, when I try to go beyond 51 (found empirically by parametrizing

Re: Improving Flink performance

2017-01-23 Thread Greg Hogan
Hi Jonas, It looks like the mailing list has removed your formatting and/or attachments. Greg On Mon, Jan 23, 2017 at 6:08 AM, Jonas wrote: > Hello! > > I'm having performance problems with a Flink job. If there is anything > valuable missing, please ask and I will try to

Re: Release 1.2?

2017-01-23 Thread Greg Hogan
Support for Hadoop 1 was dropped in FLINK-4895 [1]. [1] https://issues.apache.org/jira/browse/FLINK-4895 On Mon, Jan 23, 2017 at 11:09 AM, wrote: > I notice that for Flink 1.2.0 there is no x.y.z-hadoop1 folder (Cf. Apache > staging > repo >

RE: Release 1.2?

2017-01-23 Thread denis.dollfus
I notice that for Flink 1.2.0 there is no x.y.z-hadoop1 folder (Cf. Apache staging repo). This is very useful to me for developing on Windows. Is it temporary, for 1.2 RCs only? Regards, Denis From:

Re: Queryable State and Windows

2017-01-23 Thread Ufuk Celebi
This is not possible at the moment. We discussed this a couple of times before, but in the end did not want to expose it with the initial version, because the interfaces are still very raw. This is definitely on the agenda though. As a work around you would have to build a custom Flink version

Is there any difference between Checkpointed and CheckpointedAsynchronously?

2017-01-23 Thread Tao Meng
Hi all, When I read the source code about checkpoint, I can't find any difference between Checkpointed and CheckpointedAsynchronously. I can not sure that I am correct. The doc about `Asynchronous State Snapshots` is commented out in markdown file. I want to know if I am right (there no

Re: weird client failure/timeout

2017-01-23 Thread Abhishek R. Singh
I even make it 10 minutes: akka.client.timeout: 600s But doesn’t feel like it is taking effect. It still comes out at about the same time with the same error. -Abhishek- > On Jan 23, 2017, at 6:04 AM, Abhishek R. Singh > wrote: > > yes, I had increased it

Re: weird client failure/timeout

2017-01-23 Thread Abhishek R. Singh
yes, I had increased it to 5 minutes. It just sits there and bails out again. > On Jan 23, 2017, at 1:47 AM, Jonas wrote: > > The exception says that > > Did you already try that? > > > > -- > View this message in context: >

Queryable State and Windows

2017-01-23 Thread Joe Olson
>From what I've read in the documentation, and from the examples I've seen, in >order to make state queryable externally to Flink, the state descriptor >variables need access to the Flink runtime context. This means the stream processor has to have access to the 'Rich' level objects -

Improving Flink performance

2017-01-23 Thread Jonas
Hello! I'm having performance problems with a Flink job. If there is anything valuable missing, please ask and I will try to answer ASAP. My job looks like this: First, I read data from Kafka. This is very fast at 100k msgs/s. The data is decoded, a type is added (we have multiple message

Flink configuration

2017-01-23 Thread Nancy Estrada
Hi all, I have been reading about how to configure Flink when we have a set up consisting on a couple of VMs with more than 1 vCore. I am a bit confused about how to set the degree of parallelism in the taskmanager.numberOfTaskSlots parameter: * According to the Flink documentation[1], this

Re: Apache Flink 1.1.4 - Gelly - LocalClusteringCoefficient - Returning values above 1?

2017-01-23 Thread Vasiliki Kalavri
Hi Miguel, I don't think you're doing anything wrong. The NaN values you are getting are there because of your data. The LCC value is computed as #number_of_triangles / #number_of_triples, where #number_of_triples is [n*(n-1)]/2 for a vertex with n neighbors. It looks like there are no triangles

Re: Count window on partition

2017-01-23 Thread Fabian Hueske
Hi Dmitry, the third version is the way to go, IMO. You might want to have a larger number of partitions if you are planning to later increase the parallelism of the job. Also note, that it is not guaranteed that 4 keys are uniformly distributed to 4 tasks. It might happen that one task ends up

Re: Count window on partition

2017-01-23 Thread Kostas Kloudas
Hi Dmitry, In all cases, the result of the countWindow will be also grouped by key because of the keyBy() that you are using. If you want to have a non-keyed stream and then split it in count windows, remove the keyBy() and instead of countWindow(), use countWindowAll(). This will have

Count window on partition

2017-01-23 Thread Dmitry Golubets
Hi, I'm looking for the right way to do the following scheme: 1. Read data 2. Split it into partitions for parallel processing 3. In every partition group data in N elements batches 4. Process these batches My first attempt was: *dataStream.keyBy(_.key).countWindow(..)* But countWindow groups

Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-23 Thread Robert Metzger
Hi all, I would like to do a proper voting RC1 early this week. >From the issues mentioned here, most of them have pull requests or were changed to a lower priority. Once we've merged all outstanding PRs, I'll create the next RC. Regards, Robert On Mon, Jan 16, 2017 at 12:13 PM, Fabian Hueske

Re: weird client failure/timeout

2017-01-23 Thread Abhishek R. Singh
I am using version 1.1.4 (latest stable) > On Jan 23, 2017, at 12:41 AM, Abhishek R. Singh > wrote: > > I am trying to construct a topology like this (shown for parallelism of 4) - > basically n parallel windowed processing sub-pipelines with single source and

weird client failure/timeout

2017-01-23 Thread Abhishek R. Singh
I am trying to construct a topology like this (shown for parallelism of 4) - basically n parallel windowed processing sub-pipelines with single source and single sink: I am getting the following failure (if I go beyond 28 - found empirically using binary search). There is nothing in the job