Re: checkpoint notifier not found?

2016-12-12 Thread Abhishek R. Singh
https://issues.apache.org/jira/browse/FLINK-5323 > On Dec 12, 2016, at 5:37 AM, Till Rohrmann wrote: > > Hi Abhishek, > > great to hear that you like to become part of the Flink community. Here are > some information

use case for sliding windows

2016-12-12 Thread Meghashyam Sandeep V
Hi There, I have a streaming job which has source as Kafka and sink as Cassandra. I have a use case where I wouldn't want to write some events to Cassandra when there are more than 100 events for a given 'id' (field in my Pojo) in 5mins. Is this a good usecase for SlidingWindows? Can I get the

Re: Reg. custom sinks in Flink

2016-12-12 Thread Chesnay Schepler
Hello, the query is generated automatically from the pojo by the datastax MappingManager in the CassandraPojoSink; Flink isn't generating anything itself. On the MappingManager you can set the TTL for all queries (it also allows some other stuff). So, to allow the user to set the TTL we

Flink 1.1.3 RollingSink - mismatch in the number of records consumed/produced

2016-12-12 Thread Dominik Safaric
Hi everyone, As I’ve implemented a RollingSink writing messages consumed from a Kafka log, I’ve observed that there is a significant mismatch in the number of messages consumed and written to file system. Namely, the consumed Kafka topic contains in total 1.000.000 messages. The topology does

Re: How to retrieve values from yarn.taskmanager.env in a Job?

2016-12-12 Thread Shannon Carey
Hi Till, Yes, System.getenv() was the first thing I tried. It'd be great if someone else can reproduce the issue, but for now I'll submit a JIRA with the assumption that it really is not working right. https://issues.apache.org/jira/browse/FLINK-5322 -Shannon From: Till Rohrmann

Re: How to retrieve values from yarn.taskmanager.env in a Job?

2016-12-12 Thread Shannon Carey
Hi Chesnay, Since that configuration option is supposed to apply the environment variables to the task managers, I figured it would definitely be available within the stream operators. I'm not sure whether the job plan runs within a task manager or not, but hopefully it does? In my particular

Re: Reg. custom sinks in Flink

2016-12-12 Thread Meghashyam Sandeep V
Thank you Till. I wanted to contribute towards Flink. Looks like this could be a good start. I couldn't find the place where the insert query is built for Pojo sinks in CassandraSink.java, CassandraPojoSink.java, or CassandraSinkBase.java. Could you throw some light about how that insert query is

Re: Incremental aggregations - Example not working

2016-12-12 Thread Matt
Err, I meant if I'm not wrong * On Mon, Dec 12, 2016 at 2:02 PM, Matt wrote: > I just checked with version 1.1.3 and it works fine, the problem is that > in that version we can't use Kafka 0.10 if I'm not work. Thank you for the > workaround! > > Best, > Matt > > On Mon,

Re: Incremental aggregations - Example not working

2016-12-12 Thread Matt
I just checked with version 1.1.3 and it works fine, the problem is that in that version we can't use Kafka 0.10 if I'm not work. Thank you for the workaround! Best, Matt On Mon, Dec 12, 2016 at 1:52 PM, Yassine MARZOUGUI < y.marzou...@mindlytix.com> wrote: > Yes, it was suppoed to work. I

Re: Incremental aggregations - Example not working

2016-12-12 Thread Yassine MARZOUGUI
Yes, it was suppoed to work. I looked into this, and as Chesnay said, this is a bug in the fold function. I opened an issue in JIRA : https://issues.apache.org/jira/browse/FLINK-5320, and will fix it very soon, thank you for reporting it. In the mean time you can workaround the problem by

Re: Avro Parquet/Flink/Beam

2016-12-12 Thread Jean-Baptiste Onofré
Hi Billy, I will push my branch with ParquetIO on my github. Yes, the Beam IO is independent from the runner. Regards JB On 12/12/2016 05:29 PM, Newport, Billy wrote: I don't mind writing one, is there a fork for the ParquetIO works that's already been done or is it in trunk? The ParquetIO

RE: Avro Parquet/Flink/Beam

2016-12-12 Thread Newport, Billy
I don't mind writing one, is there a fork for the ParquetIO works that's already been done or is it in trunk? The ParquetIO is independent of the runner being used? Is that right? Thanks -Original Message- From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] Sent: Monday, December

Re: Avro Parquet/Flink/Beam

2016-12-12 Thread Jean-Baptiste Onofré
Hi, Beam provides a AvroCoder/AvroIO that you can use, but not yet a ParquetIO (I created a Jira about that and started to work on it). You can use the Avro reader to populate the PCollection and then use a custom DoFn to create the Parquet (waiting for the ParquetIO). Regards JB On

Avro Parquet/Flink/Beam

2016-12-12 Thread Newport, Billy
Are there any examples showing the use of beam with avro/parquet and a flink runner? I see an avro reader for beam, is it a matter of writing another one for avro-parquet or does this need to use the flink HadoopOutputFormat for example? Thanks Billy

Re: Reg. custom sinks in Flink

2016-12-12 Thread Till Rohrmann
(1) A subtask is a parallel instance of an operator and thus responsible for a partition (possibly infinite) of the whole DataStream/DataSet. (2) Maybe you can add this feature to Flink's Cassandra Sink. Cheers, Till On Mon, Dec 12, 2016 at 4:30 PM, Meghashyam Sandeep V <

Re: Incremental aggregations - Example not working

2016-12-12 Thread Matt
I'm using 1.2-SNAPSHOT, should it work in that version? On Mon, Dec 12, 2016 at 12:18 PM, Yassine MARZOUGUI < y.marzou...@mindlytix.com> wrote: > Hi Matt, > > What version of Flink are you using? > The incremental agregation with fold(ACC, FoldFunction, WindowFunction) > in a new change that

Re: Reg. custom sinks in Flink

2016-12-12 Thread Meghashyam Sandeep V
Data piles up in Cassandra without TTL. Is there a workaround for this problem? Is there a way to specify my query and still use Pojo? Thanks, On Mon, Dec 12, 2016 at 7:26 AM, Chesnay Schepler wrote: > Regarding 2) I don't think so. That would require access to the datastax

Re: Reg. custom sinks in Flink

2016-12-12 Thread Chesnay Schepler
Regarding 2) I don't think so. That would require access to the datastax MappingManager. We could add something similar as the ClusterBuilder for that though. Regards, Chesnay On 12.12.2016 16:15, Meghashyam Sandeep V wrote: Hi Till, Thanks for the information. 1. What do you mean by

Re: Incremental aggregations - Example not working

2016-12-12 Thread Yassine MARZOUGUI
Hi Matt, What version of Flink are you using? The incremental agregation with fold(ACC, FoldFunction, WindowFunction) in a new change that will be part of Flink 1.2, for Flink 1.1 the correct way to perform incrementation aggregations is : apply(ACC, FoldFunction, WindowFunction) (see the docs

Re: Reg. custom sinks in Flink

2016-12-12 Thread Meghashyam Sandeep V
Hi Till, Thanks for the information. 1. What do you mean by 'subtask', is it every partition or every message in the stream? 2. I tried using CassandraSink with a Pojo. Is there a way to specify TTL as I can't use a query when I have a datastream with Pojo? CassandraSink.addSink(messageStream)

Re: Incremental aggregations - Example not working

2016-12-12 Thread Matt
In case this is important, if I remove the WindowFunction, and only use the FoldFunction it works fine. I don't see what is wrong... On Mon, Dec 12, 2016 at 10:53 AM, Matt wrote: > Hi, > > I'm following the documentation [1] of window functions with incremental >

Incremental aggregations - Example not working

2016-12-12 Thread Matt
Hi, I'm following the documentation [1] of window functions with incremental aggregations, but I'm getting an "input mismatch" error. The code [2] is almost identical to the one in the documentation, at the bottom you can find the exact error. What am I missing? Can you provide a working

Re: checkpoint notifier not found?

2016-12-12 Thread Till Rohrmann
Hi Abhishek, great to hear that you like to become part of the Flink community. Here are some information for how to contribute [1]. [1] http://flink.apache.org/how-to-contribute.html Cheers, Till On Mon, Dec 12, 2016 at 12:36 PM, Abhishek Singh < abhis...@tetrationanalytics.com> wrote: >

Re: How to retrieve values from yarn.taskmanager.env in a Job?

2016-12-12 Thread Chesnay Schepler
Hello, can you clarify one small thing for me: Do you want to access this parameter when you define the plan (aka when you call methods on the StreamExecutionEnvironment or DataStream instances) or from within your functions/operators? Regards, Chesnay Schepler On 12.12.2016 14:21, Till

Re: Testing Flink Streaming applications - controlling the clock

2016-12-12 Thread Till Rohrmann
Hi Rohit, it depends a little bit on your tests. If you test individual operators you can use the AbstractStreamOperatorTestHarness class which allows to set the processing time via AbstractStreamOperatorTestHarness#setProcessingTime. You can also set the ProcessingTimeService used by a

Re: How to retrieve values from yarn.taskmanager.env in a Job?

2016-12-12 Thread Till Rohrmann
Hi Shannon, have you tried accessing the environment variables via System.getenv()? This should give you a map of string-string key value pairs where the key is the environment variable name. If your values are not set in the returned map, then this indicates a bug in Flink and it would be great

Re: checkpoint notifier not found?

2016-12-12 Thread Abhishek Singh
Will be happy to. Could you guide me a bit in terms of what I need to do? I am a newbie to open source contributing. And currently at Frankfurt airport. When I hit ground will be happy to contribute back. Love the project !! Thanks for the awesomeness. On Mon, Dec 12, 2016 at 12:29 PM Stephan

Re: checkpoint notifier not found?

2016-12-12 Thread Stephan Ewen
Thanks for reporting this. It would be awesome if you could file a JIRA or a pull request for fixing the docs for that. On Sat, Dec 10, 2016 at 2:02 AM, Abhishek R. Singh < abhis...@tetrationanalytics.com> wrote: > I was following the official documentation: https://ci. >

Re: Reg. custom sinks in Flink

2016-12-12 Thread Till Rohrmann
Hi Meghashyam, 1. You can perform initializations in the open method of the RichSinkFunction interface. The open method will be called once for every sub task when initializing it. If you want to share the resource across multiple sub tasks running in the same JVM you can also