Re: 1.2 release date

2017-02-07 Thread Robert Metzger
Hi Anton, which contributors list are you referring to? I've included all release contributors into the rel announcement. On Mon, Feb 6, 2017 at 11:50 AM, Anton Solovev wrote: > Hi, > > > > Could you update List of contributors after that? J > > > > *Anton Solovev* > > *Software Engineer* > > >

Re: Regarding Flink as a web service

2017-02-07 Thread Robert Metzger
There's also a nice documentation page on the feature now: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/queryable_state.html On Tue, Jan 31, 2017 at 6:18 PM, Aljoscha Krettek wrote: > +u...@apache.org Because he implemented queryable state. > > There is also queryable

Re: TaskManager randomly dies

2017-01-28 Thread Robert Metzger
Hi, which Flink version are you using? This issue occurred quite freqently in the 1.2.0 RC0 and should be fixed in later RCs. On Fri, Jan 27, 2017 at 4:13 PM, Malte Schwarzer wrote: > Hi all, > > when running a Flink batch job, from time to time a TaskManager dies > randomly, which makes the fu

Re: User configuration

2017-01-26 Thread Robert Metzger
Hi, Is this what you are looking for? https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/best_practices.html#parsing-command-line-arguments-and-passing-them-around-in-your-flink-application On Thu, Jan 26, 2017 at 5:38 PM, Dmitry Golubets wrote: > Hi, > > Is there a place for

Re: Flink dependencies shading

2017-01-26 Thread Robert Metzger
Hi Dmitry, I think this issue is new. Where is the AWS SDK dependency coming from? Maybe you can resolve the issue on your side for now. I've filed a JIRA for this issue: https://issues.apache.org/jira/browse/FLINK-5661 On Wed, Jan 25, 2017 at 8:24 PM, Dmitry Golubets wrote: > I've build late

Re: Debugging, logging and measuring operator subtask performance

2017-01-26 Thread Robert Metzger
Hi Dominik, You could measure the throughput at each task in your job to see if one operator is causing the slowdown (for example using Flink's metrics system) Maybe the backpressure view already helps finding the task that causes the issue. Did you check if there are enough resources available f

Re: Issues while restarting a job on HA cluster

2017-01-26 Thread Robert Metzger
Hi Ani, This error is independent of cancel vs stop. Its an issue of loading the MapR classes from the classloaders. Do you user jars contain any MapR code (either mapr streams or maprdb)? If so, I would recommend you to put these MapR libraries into the "lib/" folder of Flink. They'll then be d

Re: Kafka data not read in FlinkKafkaConsumer09 second time from command line

2017-01-26 Thread Robert Metzger
Hi, I would guess that the watermark generation does not work as expected. I would recommend to log the extracted timestamps + the watermarks to understand how time is progressing, and when watermarks are generated to trigger a window computation. On Tue, Jan 24, 2017 at 6:53 PM, Sujit Sakre wrot

Re: Rate-limit processing

2017-01-26 Thread Robert Metzger
Hi Florian, you can rate-limit the Kafka consumer by implementing a custom DeserializationSchema that sleeps a bit from time to time (or at each deserialization step) On Tue, Jan 24, 2017 at 1:16 PM, Florian König wrote: > Hi Till, > > thank you for the very helpful hints. You are right, I alre

Re: Improving Flink Performance

2017-01-26 Thread Robert Metzger
Hi Jonas, The good news is that your job is completely parallelizable. So if you are running it on a cluster, you can scale it at least to the number of Kafka partitions you have (actually even further, because the Kafka consumers are not the issue). I don't think that the scala (=akka) worker th

Re: Flink with Yarn on MapR

2017-01-25 Thread Robert Metzger
Hi, I think this is a re-post of a question I've already "answered": http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Flink-with-Yarn-on-MapR-td15448.html On Wed, Jan 25, 2017 at 12:12 AM, ani.desh1512 wrote: > Hi, > I am trying to setup flink with Yarn on Mapr cluster. I built f

Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-24 Thread Robert Metzger
RC1 creation is in progress ... On Mon, Jan 23, 2017 at 10:33 AM, Robert Metzger wrote: > Hi all, > > I would like to do a proper voting RC1 early this week. > From the issues mentioned here, most of them have pull requests or were > changed to a lower priority. > Onc

Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-23 Thread Robert Metzger
t;> > > > >> > > > >> > On 12.01.2017 16:20, Till Rohrmann wrote: > > >> > > > >> > I also found an issue: > > >> > > > >> > https://issues.apache.org/jira/browse/FLINK-5470 > > >> > > >

Re: Reading compressed XML data

2017-01-14 Thread Robert Metzger
Hi Sebastian, I'm not aware of a better way of implementing this in Flink. You could implement your own XmlInputFormat using Flink's InputFormat abstractions, but you would end up with almost exactly the same code as Mahout / Hadoop. I wonder why the decompression with the XmlInputFormat doesn't w

Re: 1.1.4 on YARN - vcores change?

2017-01-13 Thread Robert Metzger
Hi Shannon, Flink is reading the number of available vcores from the local YARN configuration. Is it possible that the YARN / Hadoop config on the machine where you are submitting your job from sets the number of vcores as 4 ? On Fri, Jan 13, 2017 at 12:51 AM, Shannon Carey wrote: > Did anythi

Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-12 Thread Robert Metzger
ave another bugfix for 1.2.: > > https://issues.apache.org/jira/browse/FLINK-2662 (pending PR) > > 2017-01-10 15:16 GMT+01:00 Robert Metzger : > > > Hi, > > > > this depends a lot on the number of issues we find during the testing. > > > > > &

Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-10 Thread Robert Metzger
) https://issues.apache.org/jira/browse/FLINK-5381 (resolved) https://issues.apache.org/jira/browse/FLINK-5380 (pending PR) On Tue, Jan 10, 2017 at 11:58 AM, shijinkui wrote: > Do we have a probable time of 1.2 release? This month or Next month? > > -邮件原件- > 发件人: Ro

[DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-03 Thread Robert Metzger
Hi, First of all, I wish everybody a happy new year 2017. I've set user@flink in CC so that users who are interested in helping with the testing get notified. Please respond only to the dev@ list to keep the discussion there! According to the 1.2 release discussion thread, I've created a first r

Re: Problem with JodaTime

2016-12-24 Thread Robert Metzger
Hi Stephan, Can you post the list of fields in the POJO and the full exception (so that I can see which serializer is being used). In general, to fix such an issue, you have to implement a custom serializer for the field that is causing the issues. On Thu, Dec 22, 2016 at 3:44 PM, Stephan Epping

Re: Can I see the kafka header information in the Flink connector?

2016-12-24 Thread Robert Metzger
Hi Ron, there is a KeyedDeserializationSchema for the Kafka connector, that exposes the source partition, offset and topic. Is that what you are looking for? On Thu, Dec 22, 2016 at 5:33 PM, Ron Crocker wrote: > Looking at the Kafka 0.8 connector API, my deserializer definitely gets > the messa

Re: Standalone cluster layout

2016-12-14 Thread Robert Metzger
Hi Avihai, 1. As much as possible (I would leave the operating system at least 1 GB of memory). It depends also on the workload you have. For streaming workload with very small state, you can use Flink with 1-2 GB of heap space and still get very good performance. 2. Yes, I would recommend to run

Re: PartitionedState and watermark of Window coGroup()

2016-12-14 Thread Robert Metzger
Hi, elements are coGrouped on the specified key. So only elements with the same key in both streams end up in the same group. Yes, the watermark uses the minimum of both streams. On Tue, Dec 13, 2016 at 7:02 PM, Sendoh wrote: > Hi Flink users, > > I'm a bit confused about how these two work

Re: How to retrieve values from yarn.taskmanager.env in a Job?

2016-12-14 Thread Robert Metzger
Looks like the issue was resolved in the JIRA issue: https://issues.apache.org/jira/browse/FLINK-5322 On Tue, Dec 13, 2016 at 7:32 PM, Shannon Carey wrote: > Till, > > Unfortunately, System.getenv() doesn't contain the expected variable even > within the UDFs, but thanks for the info! > > In the

Re: checkpoint notifier not found?

2016-12-14 Thread Robert Metzger
Hi Abhishek, you can not push to the Flink repository directly. Only Flink committers are allowed to do that. But you can fork the Flink repository on github to your own GitHub account and then push the changes to your Github. Then, you can create a pull request to offer those changes to the main F

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-13 Thread Robert Metzger
Exactly. On Tue, Dec 13, 2016 at 4:40 PM, Timur Shenkao wrote: > How to subscribe? > community-subscr...@flink.apache.org ? > > On Tue, Dec 13, 2016 at 6:32 PM, Robert Metzger > wrote: > >> The commun...@flink.apache.org has been created :) >> >> On T

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-13 Thread Robert Metzger
The commun...@flink.apache.org has been created :) On Tue, Dec 13, 2016 at 10:43 AM, Robert Metzger wrote: > +1. I've requested the community@ mailing list from infra. > > On Tue, Dec 13, 2016 at 10:40 AM, Kostas Tzoumas > wrote: > >> It seems that several folks

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-13 Thread Robert Metzger
t; >> I am looking for a job I won't register for a mailing list or browse > >> through the archive of one but rather search it via Google. So what > about > >> putting it on a dedicated site on the Web Page. This feels more > intuitive > >> to me and gi

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-09 Thread Robert Metzger
ibing to a new mailing list is an overhead. > As a temp solution, we could cc the dev and user list in the first few > (say, 3) threads and encourage folks in these threads to sign up for the > news@ list. > > On Thu, Dec 8, 2016 at 10:07 AM, Robert Metzger > wrote: > >> T

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-08 Thread Robert Metzger
Thank you for speaking up Kanstantsin. I really don't want to downgrade the experience on the user@ list. I wonder if jobs@flink would be a too narrowly-scoped mailing list. Maybe we could also start a community@flink (alternatively also general@) mailing list for everything relating to the broade

Re: Resource under-utilization when using RocksDb state backend

2016-12-05 Thread Robert Metzger
x27;m not very experienced in sussing out > IO issues so perhaps there is something else I'm missing. > > I'll keep investigating. If I continue to come up empty then I guess my > next steps may be to stage some independent tests directly against RocksDb. > > -Cliff >

Re: Flink 1.1.3 OOME Permgen

2016-12-05 Thread Robert Metzger
atic field is a cache, > which is filled by all classes, which were serialized/deserialized by > Jackson. > > Best, > > Konstantin > > On 05.12.2016 11:55, Robert Metzger wrote: > > I've submitted Wordcount 410 times to a testing cluster and a streaming > > jo

Re: flink-job-in-yarn,has max memory

2016-12-05 Thread Robert Metzger
Hi, The TaskManager reports a total memory usage of 3 GB. That's fine, given that you requested containers of size 4GB. Flink doesn't allocate all the memory assigned to the container to the heap. Are you running a batch or a streaming job? On Tue, Nov 29, 2016 at 12:43 PM, wrote: > Hi, >

Re: CEP issue

2016-12-05 Thread Robert Metzger
Hi Kieran, which statebackend are you using for your CEP job? Using RocksDB as a state backend could potentially fix the issue. What's the number of keys in your stream? On Tue, Nov 29, 2016 at 3:18 PM, kieran . wrote: > Hello, > > I am currently building a multi-tenant monitoring application

Re: Dealing with Multiple sinks in Flink

2016-12-05 Thread Robert Metzger
For enabling JMX when starting Flink from your IDE, you need to do the following: Configuration configuration = new Configuration(); configuration.setString("metrics.reporters", "my_jmx_reporter"); configuration.setString("metrics.reporter.my_jmx_reporter.class", "org.apache.flink.metrics.jmx.JMXR

Re: Dealing with Multiple sinks in Flink

2016-12-05 Thread Robert Metzger
Hi Vinay, the JMX port depends on the port you've configured for the JMX metrics reporter. Did you configure it? Regards, Robert On Fri, Dec 2, 2016 at 11:14 AM, vinay patil wrote: > Hi Robert, > > I had resolved this issue earlier as I had not set the Kafka source > parallelism to number of

Re: Flink 1.1.3 OOME Permgen

2016-12-05 Thread Robert Metzger
I've submitted Wordcount 410 times to a testing cluster and a streaming job 290 times and I could not reproduce the issue with 1.1.3. Also, the heapdump of one of the TaskManagers looked pretty normal. Do you have any ideas how to reproduce the issue? On Fri, Dec 2, 2016 at 3:21 PM, R

Re: In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-05 Thread Robert Metzger
t;> the time of the question)). Here is the watermark assignment : >> >> .assignTimestampsAndWatermarks(new >> AscendingTimestampExtractor>() >> { >> @Override >> public long extractAscendingTimestamp(Tuple3 >> tuple3) { >>

Re: Resource under-utilization when using RocksDb state backend

2016-12-05 Thread Robert Metzger
Hi Cliff, which Flink version are you using? Are you using Eventtime or processing time windows? I suspect that your disks are "burning" (= your job is IO bound). Can you check with a tool like "iotop" how much disk IO Flink is producing? Then, I would set this number in relation with the theoret

Re: In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-05 Thread Robert Metzger
Hi Yassine, are you sure your watermark extractor is the same between the two versions. It sounds a bit like the watermarks for the 1.2 code are not generated correctly. Regards, Robert On Sat, Dec 3, 2016 at 9:01 AM, Yassine MARZOUGUI wrote: > Hi all, > > With 1.1-SNAPSHOT, EventTimeSessionWi

Re: Flink 1.1.3 OOME Permgen

2016-12-02 Thread Robert Metzger
Thank you for reporting the issue Konstantin. I've filed a JIRA for the jackson issue: https://issues.apache.org/jira/browse/FLINK-5233. As I said in the JIRA, I propose to upgrade to Jackson 2.7.8, as this version contains the fix for the issue, but its not a major jackson upgrade. Any chance you

Re: Container JMX port setting / discovery for Flink on YARN

2016-11-25 Thread Robert Metzger
Hi Yury, Flink is using its own JMX server instance (not the JVM's one). Therefore, you can configure the server yourself. Check out this documentation page: https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/metrics.html#reporter metrics.reporter.my_jmx_reporter.class: org.ap

Re: State Serializer/Deserializer between savepoints

2016-11-25 Thread Robert Metzger
Hi Daniel, This is currently a limitation in Flink's savepoints. You can not change the serialization schema of the state between savepoints. In Flink 1.2 there might be the first building blocks available for using serializers aware of savepoints. Exposing this feature to the API will probably ta

Re: Error while running Yahoo Streaming Benchmarks on a single machine

2016-11-23 Thread Robert Metzger
Hi, this is not really a failure. It just means that the job has been cancelled by somebody (using the web interface or the ./bin/flink tool). On Sat, Nov 19, 2016 at 3:59 AM, Muhammad Haseeb Javed < 11besemja...@seecs.edu.pk> wrote: > I am trying to run the Yahoo Streaming Benchmarks on a singl

Re: Flink Streaming Data Source Node

2016-11-23 Thread Robert Metzger
Hi, I'm not sure if I fully understood your question. The number of input sources is always less or equal to the number of slots in one node. Usually source instances are equally distributed among the parallel workers (TaskManagers). Maybe this document describing the deployment model is also help

Re: Reading files from an S3 folder

2016-11-23 Thread Robert Metzger
Hi, This is not the expected behavior. Each parallel instance should read only one file. The files should not be read multiple times by the different parallel instances. How did you check / find out that each node is reading all the data? Regards, Robert On Tue, Nov 22, 2016 at 7:42 PM, Alex Reid

Re: Sink not switched to "RUNUNG" even though a task slot is available

2016-11-23 Thread Robert Metzger
Hi Yassine, you don't necessarily need to set the parallelism of the last two operators of 31, the sink with parallelism 1 will fit still into the slots. A task slot can, by default, hold an entire "slice" or parallel instance of a job. The reason why the sink stays in state CREATE in the beginni

Re: S3 checkpointing in AWS in Frankfurt

2016-11-23 Thread Robert Metzger
Hi Jonathan, have you tried using Amazon's latest EMR Hadoop distribution? Maybe they've fixed the issue in their for older Hadoop releases? On Wed, Nov 23, 2016 at 4:38 PM, Scott Kidder wrote: > Hi Jonathan, > > You might be better off creating a small Hadoop HDFS cluster just for the > purpos

Re: Running the JobManager and TaskManager on the same node in a cluster

2016-11-19 Thread Robert Metzger
Hi Dominik, Your observation is right, running the JobManager and TaskManager on the same node is no problem. If that machine fails, both services will be affected, but as long as you have infrastructure in place (YARN for example) to start them somewhere else, nothing bad will happen. Regarding

Re: RDF/SPARQL and Flink

2016-11-19 Thread Robert Metzger
Hi Tomas, I'm really not an RDF processing expert, but since nobody responded for 4 days, I'll try to give you some pointers: I know that there've been discussions regarding RDF processing on this mailing list before. Check out this one for example: http://apache-flink-user-mailing-list-archive.23

Re: Flink streaming with 1+ TB of managed state

2016-11-19 Thread Robert Metzger
Hi Steven, According to this presentation, King.com is using Flink with terabytes of state: http://flink-forward.org/wp-content/uploads/2016/07/Gyulo-Fo%CC%81ra-RBEA-Scalable-Real-Time-Analytics-at-King.compressed.pdf (see Page 4 specifically) For the 90GB experiment, what is the expected time fo

Re: flink-dist shading

2016-11-19 Thread Robert Metzger
Hi Craig, I also received only this email (and I'm a moderator of the dev@ list, so the message never made it into Apache's infra) When this issue was first reported [1][2] I asked on the Maven mailing list what's going on [3]. I think this JIRA contains the most information on the issue: https://

Re: Flink Avro Kafka Reading/Writing

2016-11-12 Thread Robert Metzger
Hi, yes, Flink can read and write Avro schema to Kafka, using a custom serialization / deser schema. On Fri, Nov 11, 2016 at 6:05 AM, daviD wrote: > Hi All, > > Does anyone know if Flink can read and write Avro schema to Kafka? > > Thanks > > daviD >

Re: Flink - Nifi Connectors - Class not found

2016-11-12 Thread Robert Metzger
Hi, the problem is that Flink's YARN code is not available in the Hadoop 1.2.1 build. How do you try to execute the Flink job to trigger this error message? On Fri, Nov 11, 2016 at 12:23 PM, PACE, JAMES wrote: > I am running Apache Flink 1.1.3 – Hadoop version 1.2.1 with the NiFi > connector.

Re: Flink Kafka Connector behaviour on leader change and broker upgrade

2016-11-06 Thread Robert Metzger
Hi, yes, the Flink Kafka connector for Kafka 0.8 handles broker leader changes without failing. The SimpleConsumer provided by Kafka 0.8 doesn't handle that. The 0.9 Flink Kafka consumer also supports broker leader changes transparently. If you keep using the Flink Kafka 0.8 connector with a 0.9 b

Re: Kinesis Connector Dependency Problems

2016-11-04 Thread Robert Metzger
ng properly with EMR 4.8. It seems so obvious in > retrospect... thanks again for the assistance! > > Cheers, > > Justin > > On Tue, Nov 1, 2016 at 11:44 AM, Robert Metzger > wrote: > >> Hi Justin, >> >> thank you for sharing the classpath of the

Re: Flink Kafka 0.10.0.0 connector

2016-11-03 Thread Robert Metzger
e jar distribution as a clean maven package (without running > the tests). > > Thanks, > Dominik > > On 3 Nov 2016, at 13:29, Robert Metzger wrote: > > Hi Dominik, > > Some of Kafka's APIs changed between Kafka 0.9 and 0.10, so you can not > compile the Kafk

Re: Flink Kafka 0.10.0.0 connector

2016-11-03 Thread Robert Metzger
Hi Dominik, Some of Kafka's APIs changed between Kafka 0.9 and 0.10, so you can not compile the Kafka 0.9 against Kafka 0.10 dependencies. I think the easiest way to get Kafka 0.10 running with Flink is to use the Kafka 0.10 connector in the current Flink master. You can probably copy the connect

Re: Kinesis Connector Dependency Problems

2016-11-01 Thread Robert Metzger
Hi Justin, thank you for sharing the classpath of the Flink container with us. It contains what Till was already expecting: An older version of the AWS SDK. If you have some spare time, could you quickly try to run your program with a newer EMR version, just to validate our suspicion? If the erro

Re: Kafka + Flink, Basic Questions

2016-10-31 Thread Robert Metzger
Hi Matt, This is a fairly extensive question. I'll try to answer all of them, but I don't have the time right now to extensively discuss the architecture of your application. Maybe there's some other person on the ML who can extend my answers. (Answers in-line below) On Mon, Oct 31, 2016 at 3:37

Re: FlinkKafkaConsumerBase - Received confirmation for unknown checkpoint

2016-10-27 Thread Robert Metzger
Hi, it would be nice if you could check with a stable version as well. Thank you! On Thu, Oct 27, 2016 at 9:58 AM, PedroMrChaves wrote: > Hello, > > I Am using the version 1.2-SNAPSHOT. > I will try with a stable version to see if the problem persists. > > Regards, > Pedro Chaves. > > > > -- >

Re: "Slow ReadProcessor" warnings when using BucketSink

2016-10-26 Thread Robert Metzger
irectly > writing to HDFS. > > I can also run a terasort or teragen in parallel without any problems. > > Best, > Max > > 2016-10-12 11:32 GMT+02:00 Robert Metzger : > >> Hi, >> I haven't seen this error before. Also, I didn't find anything helpful >

Re: Distributing Tasks over Task manager

2016-10-26 Thread Robert Metzger
nt of sub tasks, the >> following tasks with a parallelism of 2 are distributed over the two task >> manager. >> >> Interesting is also that the task manager have 6 task slots configured >> and the expensive part has 6 sub tasks on one task manager but still >> ev

Re: Watermarks and window firing

2016-10-26 Thread Robert Metzger
Just for others who are wondering what this email is about: I suspect that this email was send accidentally and that this is the correct one: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Event-time-watermarks-and-windows-td9687.html On Mon, Oct 24, 2016 at 4:56 PM, Paul Joir

Re: Unit testing a Kafka stream based application?

2016-10-26 Thread Robert Metzger
Hi Niels, Sorry for the late response. you can launch a Kafka Broker within a JVM and use it for testing purposes. Flink's Kafka connector is using that a lot for integration tests. Here is the code starting the Kafka server: https://github.com/apache/flink/blob/770f2f83a81b2810aff171b2f56390ef68

Re: Side effects or multiple sinks on streaming jobs?

2016-10-26 Thread Robert Metzger
ore than one tuple in my operations then flatmap > and use a KeyedSerializationSchema? > > or is there a way to emit a tuple to another sink from within operations > directly? > > On Wed, Oct 26, 2016 at 9:20 AM, Robert Metzger > wrote: > >> Hi Luis, >> >> Y

Re: FlinkKafkaConsumerBase - Received confirmation for unknown checkpoint

2016-10-26 Thread Robert Metzger
Hi Pedro, The message is a bit unexpected for me as well, but it does not make the checkpointing inconsistent. The only thing that's not happening in case of this warning is that the offsets are not written to Zookeeper. Which Flink version are you using? On Mon, Oct 24, 2016 at 7:25 PM, Aljosc

Re: Side effects or multiple sinks on streaming jobs?

2016-10-26 Thread Robert Metzger
Hi Luis, You can define as many data sinks as you want in a Flink job topology. So its not a problem for your use case to define two Kafka sinks, sending data to different topics. Regards, Robert On Tue, Oct 25, 2016 at 3:30 PM, Luis Mariano Guerra < mari...@event-fabric.com> wrote: > hi, > >

Re: add FLINK_LIB_DIR to classpath on yarn -> add properties file to class path on yarn

2016-10-26 Thread Robert Metzger
Hi Vinay, the JobManager and TaskManager logs contain the classpath used when starting a container on YARN. Can you check if the yaml file is in the classpath? On Tue, Oct 25, 2016 at 8:28 AM, vinay patil wrote: > Hi Max, > > As discussed here , I have put my yaml file in the flink lib director

[DISCUSS] Drop Hadoop 1 support with Flink 1.2

2016-10-13 Thread Robert Metzger
Hi, The Apache Hadoop community has recently released the first alpha version for Hadoop 3.0.0, while we are still supporting Hadoop 1. I think its time to finally drop Hadoop 1 support in Flink. The last minor Hadoop 1 release was in 27 June, 2014. Apache Spark dropped Hadoop 1 support with thei

Re: Flink Kafka Consumer Behaviour

2016-10-13 Thread Robert Metzger
Thank you for investigating the issue. I've filed a JIRA: https://issues.apache.org/jira/browse/FLINK-4822 On Wed, Oct 12, 2016 at 8:12 PM, Anchit Jatana wrote: > Hi Janardhan/Stephan, > > I just figured out what the issue is (Talking about Flink KafkaConnector08, > don't know about Flink KafkaC

Re: Flink Kafka connector08 not updating the offsets with the zookeeper

2016-10-13 Thread Robert Metzger
Okay, I see. According to this document, we need to set a consumer id for each groupid and topic: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper I created a JIRA for fixing this issue: https://issues.apache.org/jira/browse/FLINK-4822 On Wed, Oct 12, 2016 at

Re: Data Transfer between TM should be encrypted

2016-10-13 Thread Robert Metzger
Hi, the release dates depend on the community, when features are ready and so on. There was no discussion yet when we plan to do the release, because most of the features we want to have in are not yet done yet. I think its likely that we'll have a 1.2 release by end of this year. Regards, Robert

Re: bucketing in RollingSink

2016-10-13 Thread Robert Metzger
tion where I could follow its progress? > > > > Thanks again! > > > > *From: *Robert Metzger > *Reply-To: *"user@flink.apache.org" > *Date: *Wednesday, October 12, 2016 at 5:50 PM > *To: *"user@flink.apache.org" > *Subject: *Re: bucketin

Re: bucketing in RollingSink

2016-10-12 Thread Robert Metzger
Hi Robert, I see two possible workarounds: 1) You use the unreleased Flink 1.2-SNAPSHOT version. From time to time, there are some unstable commits in that version, but most of the time, its quite stable. We provide nightly binaries and maven artifacts for snapshot versions here: http://flink.apac

Re: Distributing Tasks over Task manager

2016-10-12 Thread Robert Metzger
Hi Jürgen, Are you using the DataStream or the DataSet API? Maybe the operator chaining is causing too many operations to be "packed" into one task. Check out this documentation page: https://ci.apache.org/projects/flink/flink-docs-master/dev/datastream_api.html#task-chaining-and-resource-groups

Re: Tumbling window rich functionality

2016-10-12 Thread Robert Metzger
Hi, apply() will be called for each key. On Wed, Oct 12, 2016 at 2:25 PM, Swapnil Chougule wrote: > Thanks Aljoscha. > > Whenever I am using WindowFunction.apply() on keyed stream, apply() will > be called once or multiple times (equal to number of keys in that windowed > stream)? > > Ex: > Data

Re: Flink Kafka connector08 not updating the offsets with the zookeeper

2016-10-12 Thread Robert Metzger
Hi Anchit, Can you re-run your job with the debug level for Flink set to DEBUG? Then, you should see the following log message every time the offset is committed of Zookeeper: "Committing offsets to Kafka/ZooKeeper for checkpoint" Alternatively, can you check whether the offsets are available in

Re: "Slow ReadProcessor" warnings when using BucketSink

2016-10-12 Thread Robert Metzger
Hi, I haven't seen this error before. Also, I didn't find anything helpful searching for the error on Google. Did you check the GC times also for Flink? Is your Flink job doing any heavy tasks (like maintaining large windows, or other operations involving a lot of heap space?) Regards, Robert O

Re: Data Transfer between TM should be encrypted

2016-10-12 Thread Robert Metzger
Hi, I think that pull request will be merged for 1.2. On Fri, Oct 7, 2016 at 6:26 PM, vinay patil wrote: > Hi Stephan, > > https://github.com/apache/flink/pull/2518 > Is this pull request going to be part of 1.2 release ? Just wanted to get > an idea on timelines so that I can pass on to the tea

Re: RawSchema as deserialization schema

2016-09-26 Thread Robert Metzger
The RawSchema was once part of Flink's Kafka connector. I've removed it because its implementation is trivial and I didn't expect that there are many people who need the schema (also, I think I saw people using a map() operator after the consumer to deserialize the byte[] into their formats). As S

Re: Simple batch job hangs if run twice

2016-09-22 Thread Robert Metzger
Can you try running with DEBUG logging level? Then you should see if input splits are assigned. Also, you could try to use a debugger to see what's going on. On Mon, Sep 19, 2016 at 2:04 PM, Yassine MARZOUGUI < y.marzou...@mindlytix.com> wrote: > Hi Chensey, > > I am running Flink 1.1.2, and usin

Re: how to unit test streaming window jobs?

2016-09-22 Thread Robert Metzger
Hi Luis, using Event Time windows, you should be able to generate some test data and get predictable results. Flink is internally using similar tests to ensure correctness of the windowing implementation (for example the EventTimeWindowCheckpointingITCase). Regards, Robert On Mon, Sep 12, 2016 a

Re: Distributed Cache support for StreamExecutionEnvironment

2016-09-09 Thread Robert Metzger
Hi Swapnil, there's no support for something like DistributedCache in the DataStream API. However, as a workaround, you can rely on the RichFunction's open() method's to load such data directly from a distributed file system. Regards, Robert On Fri, Sep 9, 2016 at 8:13 AM, Swapnil Chougule wrot

Re: Administration of running jobs

2016-09-09 Thread Robert Metzger
Hi Marek, You can use the RemoteExecutionEnvironment to submit a job programatically to a Flink cluster. there is also some ongoing work to programatically control submitted jobs: https://issues.apache.org/jira/browse/FLINK-4272. But for now you would probably need to hack something using the Job

Re: Unable to find KvState flink Queryablestate

2016-09-09 Thread Robert Metzger
Hi, I fear that you have to look into the Flink code to understand what's going on. Queryable state is an experimental, undocumented feature, and the committer who mainly implemented it (Ufuk) is currently on vacation. There's probably some problem with the registration of the state at the jobman

Re: NoClassDefFoundError with ElasticsearchSink on Yarn

2016-09-09 Thread Robert Metzger
Hi Steffen, I think it would be good to add it to the documentation. Would you like to open a pull request? Regards, Robert On Mon, Sep 5, 2016 at 10:26 PM, Steffen Hausmann < stef...@hausmann-family.de> wrote: > Thanks Aris for your explanation! > > A guava version mismatch was indeed the pr

Re: fsbackend with nfs

2016-09-07 Thread Robert Metzger
Hi CPC, It should be possible to use the FsBackend with NFS. However, I'm not sure how well it will perform. Regards, Robert On Mon, Sep 5, 2016 at 2:11 PM, CPC wrote: > Hi, > > Is it possible to use flinkstatebackend with nfs? We dont want to deploy > hadoop in our environment and we want to

Re: [SUGGESTION] Stack Overflow Documentation for Apache Flink

2016-09-05 Thread Robert Metzger
It seems that the content on SO is licensed under cc by-sa 3.0 with attribution required The Apache Legal FAQ is not completely clear about that case http://www.apache.org/legal/resolved.html#cc-sa But if we want, we could at least ask the legal PMC if we can add some of the content from SO into t

Re: How to get latency info from benchmark

2016-09-03 Thread Robert Metzger
> $ git checkout 547e7490fb99562ca15a2127f0ce1e784db97f3e > fatal: reference is not a tree: 547e7490fb99562ca15a2127f0ce1e784db97f3e > -- > > Regards, > Eric > > On Fri, Sep 2, 2016 at 12:01 PM, Robert Metzger > wrote: > >> Hi Eric, >> >> I'm sorry that you

Re: How to get latency info from benchmark

2016-09-02 Thread Robert Metzger
and 0.9.1. Could you tell > me the SHA you were using? > > Regards, > Eric > > > On Wed, Aug 24, 2016 at 4:57 PM, Robert Metzger > wrote: > >> Hi, >> >> Version 0.10-SNAPSHOT is pretty old. The snapshot repository of Apache >> probably doesn't

Re: Resource isolation in flink among multiple jobs

2016-08-29 Thread Robert Metzger
t; TaskManager and multiple TaskManager instances per machine? > > On Mon, Aug 29, 2016 at 7:13 PM, Robert Metzger > wrote: > >> Hi, >> >> for isolation, we recommend using YARN, or soon Mesos. >> >> For standalone installations, you'll need to manually

Re: Resource isolation in flink among multiple jobs

2016-08-29 Thread Robert Metzger
Hi, for isolation, we recommend using YARN, or soon Mesos. For standalone installations, you'll need to manually set up multiple independent Flink clusters within one physical cluster if you want them to be isolated. Regards, Robert On Mon, Aug 29, 2016 at 1:41 PM, Abhishek Agarwal wrote: >

Re: Programatically collect taskmanagers details from Job Manager

2016-08-29 Thread Robert Metzger
Hi, I think the JMX port is logged, but not accessible through the REST interface of the JobManager. I think its a useful feature. If you want, you can file a JIRA for it. On Mon, Aug 29, 2016 at 12:14 PM, Sreejith S wrote: > Thank you Stephen ! That helps, > > Is it possible to get the JMX_POR

Re: Flink JMX

2016-08-29 Thread Robert Metzger
Hi, I think in Flink 1.1.1 JMX will be started on port 8080, 8081 or 8082 (on the JM, 8081 is probably occupied by the web interface). On Mon, Aug 29, 2016 at 1:25 PM, Sreejith S wrote: > Hi Chesnay, > > I added the below configuration in flink-conf in each taskmanagers. (flink > 1.0.3 version

Re: Dynamic scaling in flink

2016-08-29 Thread Robert Metzger
Hi, this JIRA is a good starting point: https://issues.apache.org/jira/browse/FLINK-3755 If you don't care about processing guarantees and you are using a stateless streaming job, you can implement a simple Kafka consumer that uses Kafka's consumer group mechanism. I recently implemented such a Ka

Re: different Kafka serialization for keyed and non keyed messages

2016-08-29 Thread Robert Metzger
Hi Rss, > why Flink implements different serialization schemes for keyed and non keyed messages for Kafka? The non-keyed serialization schema is a basic schema, which works for most use cases. For advanced users which need access to the key, offsets, the partition or topic, there's the keyed ser

Re: Flink long-running YARN configuration

2016-08-29 Thread Robert Metzger
The JobManager UI starts when running Flink on YARN. The address of the UI is registered at YARN, so you can also access it through YARNs command line tools or its web interface. On Fri, Aug 26, 2016 at 7:28 PM, Trevor Grant wrote: > Stephan, > > Will the jobmanager-UI exist? E.g. if I am runni

Re: How to set a custom JAVA_HOME when run flink on YARN?

2016-08-29 Thread Robert Metzger
The "env.java.home" variable is only evaluated by the start scripts, not the YARN code. The solution you've mentioned earlier is a good work around in my opinion. On Fri, Aug 26, 2016 at 3:48 AM, Renkai wrote: > It seems that this config variant only effect local cluster and stand alone > clust

Re: Kafka and Flink's partitions

2016-08-29 Thread Robert Metzger
Hi rss, Concerning your questions: 1. There is currently no way to avoid the repartitioning. When you do a keyBy(), Flink will shuffle the data through the network. What you would need is a way to tell Flink that the data is already partitioned. If you would use keyed state, you would also need to

Re: How to share text file across tasks at run time in flink.

2016-08-29 Thread Robert Metzger
Hi, you could use Zookeeper if you want to dynamically change the DB name / credentials at runtime. The GlobalJobParameters are immutable at runtime, so you can not pass updates through it to the cluster. They are intended for parameters for all operators/the entire job and the web interface. Re

<    1   2   3   4   5   6   7   8   9   10   >