Re: Setting operator parallelism of a running job - Flink 1.2

2017-04-21 Thread Dominik Safaric
Hi Aljoscha,

In other words, jobs must be restarted manually? 

What about using maxParallelism() at the client level? I would expect that it 
is complementary to parallelism.default in terms of allowing Flink to handle 
the parallelism of operators, and changing it in accordance to runtime 
conditions. However, it is not the case. 

Best,
Dominik

> On 21 Apr 2017, at 15:36, Aljoscha Krettek <aljos...@apache.org> wrote:
> 
> Hi,
> changing the parallelism is not possible while a job is running (currently). 
> What you would have to do to change the parallelism is create a savepoint and 
> then restore from that savepoint with a different parallelism.
> 
> This is the savepoints documentation: 
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html
>  
> <https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html>
> 
> Best,
> Aljoscha
>> On 21. Apr 2017, at 15:22, Dominik Safaric <dominiksafa...@gmail.com 
>> <mailto:dominiksafa...@gmail.com>> wrote:
>> 
>> Hi all,
>> 
>> Is it possible to set the operator parallelism using Flink CLI while a job 
>> is running? 
>> 
>> I have a cluster of 4 worker nodes, where each node has 4 CPUs, hence the 
>> number of task slots is set to 4, whereas the paralellism.default to 16. 
>> 
>> However, if a worker fails, whereas the jobs were configured at system level 
>> to run with 16 task slots, I get the exception “Not enough free slots 
>> available to run the job.” raised and the job is not able to continue but 
>> instead of aborts. 
>> 
>> Is this the excepted behaviour? Shouldn’t Flink continue the job execution 
>> with in this case only 12 slots available? If not, can someone change the 
>> parallelism of a job while in the restart mode in order to allow the job to 
>> continue? 
>> 
>> Thanks,
>> Dominik
> 



Setting operator parallelism of a running job - Flink 1.2

2017-04-21 Thread Dominik Safaric
Hi all,

Is it possible to set the operator parallelism using Flink CLI while a job is 
running? 

I have a cluster of 4 worker nodes, where each node has 4 CPUs, hence the 
number of task slots is set to 4, whereas the paralellism.default to 16. 

However, if a worker fails, whereas the jobs were configured at system level to 
run with 16 task slots, I get the exception “Not enough free slots available to 
run the job.” raised and the job is not able to continue but instead of aborts. 

Is this the excepted behaviour? Shouldn’t Flink continue the job execution with 
in this case only 12 slots available? If not, can someone change the 
parallelism of a job while in the restart mode in order to allow the job to 
continue? 

Thanks,
Dominik

Re: Flink 1.2 time window operation

2017-03-30 Thread Dominik Safaric
> First, some remarks here -  sources (in your case the Kafka consumer) will 
> not stop fetching / producing data when the windows haven’t fired yet.
> 

This is for sure true. However, the plot shows the number of records produced 
per second, where each record was assigned a created at timestamp while being 
created and before being pushed back to Kafka. Sorry I did not clarify this 
before. Anyway, because of this I would expect to have a certain lag. Of 
course, messages will not only be produced into Kafka exactly at window expiry 
and then the produced shutdown - however, what concerns me is that messages 
were produced to Kafka before the first window expired - hence the questions. 

> If you’re writing the outputs of the window operation to Kafka (by adding a 
> Kafka sink after the windowing), then yes it should only write to Kafka when 
> the window has fired.


Hence, I this behaviour that you’ve described and we’ve expected did not occur. 

If it would help, I can share the source code and a detail Flink configuration. 

Cheers,
Dominik

> On 30 Mar 2017, at 13:09, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote:
> 
> Hi,
> 
> Thanks for the clarification.
> 
>> What are the reasons behind consuming/producing messages from/to Kafka while 
>> the window has not expired yet?
> 
> First, some remarks here -  sources (in your case the Kafka consumer) will 
> not stop fetching / producing data when the windows haven’t fired yet. Does 
> this explain what you have plotted in the diagram you attached (sorry, I 
> can’t really reason about the diagram because I’m not so sure what the values 
> of the x-y axes represent)?
> 
> If you’re writing the outputs of the window operation to Kafka (by adding a 
> Kafka sink after the windowing), then yes it should only write to Kafka when 
> the window has fired. The characteristics will also differ for different 
> types of windows, so you should definitely take a look at the Windowing docs 
> [1] about them.
> 
> Cheers,
> Gordon
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#window-assigners
>  
> <https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#window-assigners>
> On March 30, 2017 at 2:37:41 PM, Dominik Safaric (dominiksafa...@gmail.com 
> <mailto:dominiksafa...@gmail.com>) wrote:
> 
>> What are the reasons behind consuming/producing messages from/to Kafka while 
>> the window has not expired yet?



Re: Flink 1.2 time window operation

2017-03-30 Thread Dominik Safaric
Hi Gordon,

The job was run using processing time. The Kafka broker version I’ve used was 
0.10.1.1. 

Dominik

> On 30 Mar 2017, at 08:35, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote:
> 
> Hi Dominik,
> 
> Was the job running with processing time or event time? If event time, how 
> are you producing the watermarks?
> Normally to understand how windows are firing in Flink, these two factors 
> would be the place to look at.
> I can try to further explain this once you provide info with these. Also, are 
> you using Kafka 0.10?
> 
> Cheers,
> Gordon
> 
> On March 27, 2017 at 11:25:49 PM, Dominik Safaric (dominiksafa...@gmail.com 
> <mailto:dominiksafa...@gmail.com>) wrote:
> 
>> Hi all, 
>> 
>> Lately I’ve been investigating onto the performance characteristics of Flink 
>> part of our internal benchmark. Part of this we’ve developed and deployed an 
>> application that pools data from Kafka, groups the data by a key during a 
>> fixed time window of a minute.  
>> 
>> In total, the topic that the KafkaConsumer pooled from consists of 100 
>> million messages each of 100 bytes size. What we were expecting is that no 
>> records will be neither read nor produced back to Kafka for the first minute 
>> of the window operation - however, this is unfortunately not the case. Below 
>> you may find a plot showing the number of records produced per second.  
>> 
>> Could anyone provide an explanation onto the behaviour shown in the graph 
>> below? What are the reasons behind consuming/producing messages from/to 
>> Kafka while the window has not expired yet?  



Flink 1.2 time window operation

2017-03-27 Thread Dominik Safaric
Hi all,

Lately I’ve been investigating onto the performance characteristics of Flink 
part of our internal benchmark. Part of this we’ve developed and deployed an 
application that pools data from Kafka, groups the data by a key during a fixed 
time window of a minute. 

In total, the topic that the KafkaConsumer pooled from consists of 100 million 
messages each of 100 bytes size. What we were expecting is that no records will 
be neither read nor produced back to Kafka for the first minute of the window 
operation - however, this is unfortunately not the case. Below you may find a 
plot showing the number of records produced per second. 

Could anyone provide an explanation onto the behaviour shown in the graph 
below? What are the reasons behind consuming/producing messages from/to Kafka 
while the window has not expired yet? 

 

Flink_6000ms_Window_Throughput (1).pdf
Description: Adobe PDF document


Re: Benchmarking streaming frameworks

2017-03-23 Thread Dominik Safaric
Dear Giselle,

Various stream processing engines benchmarks already exist. Here are only a few 
of them I believe are worthwhile mentioning:

http://ieeexplore.ieee.org/document/7530084/ 

https://www.usenix.org/node/188989 
https://pdfs.semanticscholar.org/c82f/170fbc837291d94dc0a18f0223d182144339.pdf 

https://people.eecs.berkeley.edu/~kubitron/courses/cs262a-F14/projects/reports/project11_report_ver3.pdf
 

https://hal.inria.fr/hal-01347638/document 


Regards,
Dominik

> On 23 Mar 2017, at 11:09, Giselle van Dongen  
> wrote:
> 
> Dear users of Streaming Technologies,
> 
> As a PhD student in big data analytics, I am currently in the process of
> compiling a list of benchmarks (to test multiple streaming frameworks) in
> order to create an expanded benchmarking suite. The benchmark suite is being
> developed as a part of my current work at Ghent University.
> 
> The included frameworks at this time are, in no particular order, Spark,
> Flink, Kafka (Streams), Storm (Trident) and Drizzle. Any pointers to
> previous work or relevant benchmarks would be appreciated.
> 
> Best regards,
> Giselle van Dongen



Re: flink/cancel & shutdown hooks

2017-03-08 Thread Dominik Safaric
I’m not using YARN but instead of starting the cluster using 
bin/start-cluster.sh 

> On 8 Mar 2017, at 15:32, Ufuk Celebi <u...@apache.org> wrote:
> 
> On Wed, Mar 8, 2017 at 3:19 PM, Dominik Safaric
> <dominiksafa...@gmail.com> wrote:
>> The cluster consists of 4 workers and a master node.
> 
> Are you starting the cluster via bin/start-cluster.sh or are you using
> YARN etc.?



Re: flink/cancel & shutdown hooks

2017-03-08 Thread Dominik Safaric
I’m deploying the job from the master node of the cluster itself using 
bin/flink run -c   . 

The cluster consists of 4 workers and a master node. 

Dominik

> On 8 Mar 2017, at 15:16, Ufuk Celebi <u...@apache.org> wrote:
> 
> How are you deploying your job?
> 
> Shutdown hooks are executed when the JVM terminates whereas the cancel
> command only cancels the Flink job and the JVM process potentially
> keeps running. For example, running a standalone cluster would keep
> the JVMs running.
> 
> On Wed, Mar 8, 2017 at 9:36 AM, Timo Walther <twal...@apache.org> wrote:
>> Hi Dominik,
>> 
>> did you take a look into the logs? Maybe the exception is not shown in the
>> CLI but in the logs.
>> 
>> Timo
>> 
>> Am 07/03/17 um 23:58 schrieb Dominik Safaric:
>> 
>>> Hi all,
>>> 
>>> I would appreciate for any help or advice in regard to default Java
>>> runtime shutdown hooks and canceling Flink jobs.
>>> 
>>> Namely part of my Flink application I am using a Kafka interceptor class
>>> that defines a shutdown hook thread. When stopping the Flink streaming job
>>> on my local machine the shutdown hook gets executed, however I do not see
>>> the same behaviour when stopping the Flink application using bin/flink
>>> cancel .
>>> 
>>> Considering there are no exceptions thrown from the shutdown thread, what
>>> could the root cause of this be?
>>> 
>>> Thanks,
>>> Dominik
>> 
>> 
>> 



flink/cancel & shutdown hooks

2017-03-07 Thread Dominik Safaric
Hi all,

I would appreciate for any help or advice in regard to default Java runtime 
shutdown hooks and canceling Flink jobs.

Namely part of my Flink application I am using a Kafka interceptor class that 
defines a shutdown hook thread. When stopping the Flink streaming job on my 
local machine the shutdown hook gets executed, however I do not see the same 
behaviour when stopping the Flink application using bin/flink cancel . 

Considering there are no exceptions thrown from the shutdown thread, what could 
the root cause of this be?   

Thanks,
Dominik

Re: FlinkKafkaConsumer010 - creating a data stream of type DataStream<ConsumerRecord<K,V>>

2017-03-07 Thread Dominik Safaric
Hi Gordon,

Thanks for the advice. Following it I’ve implemented the 
Keyed(De)SerializationSchema and am able to further emit the metadata to 
downstream operators. 

Regards,
Dominik

> On 7 Mar 2017, at 07:08, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote:
> 
> Hi Dominik,
> 
> I would recommend implementing a `KeyedSerializationSchema`, and supply it to 
> the constructor when initializing your FlinkKafkaConsumer.
> 
> The `KeyedDeserializationSchema` exposes the metadata of the record such as 
> offset, partition, and key. In the schema, you can implement your own logic 
> of turning the binary data from Kafka into your own data types that carry the 
> metadata information along with the record value, e.g. POJOs or Tuples.
> 
> Some links for more info on this:
> 1. 
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/kafka.html#the-deserializationschema
>  
> <https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/kafka.html#the-deserializationschema>
> 2. 
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/types_serialization.html#flinks-typeinformation-class
>  
> <https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/types_serialization.html#flinks-typeinformation-class>
> 
> The metadata `KeyedDeserializationSchema` exposes is extracted from 
> `ConsumerRecord`s within the Kafka connector, so it doesn’t make sense to 
> wrap it up again into a `ConsumerRecord`. The schema interface exposes all 
> available metadata of the record, so it should be sufficient.
> 
> Cheers,
> Gordon
> 
> On March 7, 2017 at 3:51:59 AM, Dominik Safaric (dominiksafa...@gmail.com 
> <mailto:dominiksafa...@gmail.com>) wrote:
> 
>> Hi, 
>> 
>> Unfortunately I cannot find the option of using raw ConsumerRecord<K,V> 
>> instances when creating a Kafka data stream.  
>> 
>> In general, I would like to use an instance of the mentioned type because 
>> our use case requires certain metadata such as record offset and partition. 
>> 
>> So far I’ve examined the source code of the Kafka connector and checked the 
>> docs, but unfortunately I could not find the option of creating a data 
>> stream of the type DataStream<ConsumerRecord<K,V>>.  
>> 
>> Am I missing something or in order to have this ability I have to implement 
>> it myself and build Flink from source?  
>> 
>> Thanks in advance, 
>> Dominik 



FlinkKafkaConsumer010 - creating a data stream of type DataStream<ConsumerRecord<K,V>>

2017-03-06 Thread Dominik Safaric
Hi,

Unfortunately I cannot find the option of using raw ConsumerRecord 
instances when creating a Kafka data stream. 

In general, I would like to use an instance of the mentioned type because our 
use case requires certain metadata such as record offset and partition.

So far I’ve examined the source code of the Kafka connector and checked the 
docs, but unfortunately I could not find the option of creating a data stream 
of the type DataStream>. 

Am I missing something or in order to have this ability I have to implement it 
myself and build Flink from source? 

Thanks in advance,
Dominik  

TaskManager failure detection

2017-02-22 Thread Dominik Safaric
Hi,

As I’m investigating onto Flink’s fault tolerance capabilities, I would like to 
know what component and class is in charge of TaskManager failure detection and 
checkpoint restoring? In addition, how does Flink actually determine that a 
TaskManager has failed due to e.g. hardware failures? 

Up to my knowledge, the state should be restored using the 
CheckpointCoordinator or ExecutionGraph. Correct me if I’m wrong. 

Thanks in advance,
Dominik



Debugging, logging and measuring operator subtask performance

2017-01-25 Thread Dominik Safaric
Hi,

As I am experiencing certain performance degradations in a streaming job, I 
want to determine the root cause of it by measuring subtask performance in 
terms of resource utilisation - e.g. CPU utilisation of the thread. 

Is this somehow possible? Does Flink log scheduled and executed threads? What 
approach would you recommend? 

Thanks in advance,
Dominik 

Re: benchmarking flink streaming

2017-01-25 Thread Dominik Safaric
Hi Stephan,

As I’m already familiar with the latency markers of Flink 1.2, there is one 
question that bothers me in regard to them - how does Flink measure end-to-end 
latency when dealing with e.g. aggregations? 

Suppose you have a topology ingesting data from Kafka, and you want to output 
frequency per key. In this case, the sink is just given tuples of (key: String, 
frequency: Int).   

> On 25 Jan 2017, at 16:11, Stephan Ewen  wrote:
> 
> Hi!
> 
> There are new latency metrics in Flink 1.2 that you can use. They are 
> sampled, so not on every record.
> 
> You can always attach your own timestamps, in order to measure the latency of 
> specific records.
> 
> Stephan
> 
> 
> On Fri, Dec 16, 2016 at 5:02 PM, Meghashyam Sandeep V 
> > wrote:
> Hi Stephan,
> 
> Thanks for your answer. Is there a way to get the metrics such as latency of 
> each message in the stream? For eg. I have a Kafka source, Cassandra  sink 
> and I do some processing in between. I would like to know how long does it 
> take for each message from the beginning(entering flink streaming from kafka) 
> to end(sending/executing the query). 
> 
> On Fri, Dec 16, 2016 at 7:36 AM, Stephan Ewen  > wrote:
> Hi!
> 
> I am not sure there exists a recommended benchmarking tool. Performance 
> comparisons depend heavily on the scenarios you are looking at: Simple event 
> processing, shuffles (grouping aggregation), joins, small state, large state, 
> etc...
> 
> As fas as I know, most people try to write a "mock" version of a job that is 
> representative for the jobs they want to run, and test with that.
> 
> That said, I agree that it would actually be helpful to collect some jobs in 
> a form of "evaluation suite".
> 
> Stephan
> 
> 
> 
> On Thu, Dec 15, 2016 at 6:11 PM, Meghashyam Sandeep V 
> > wrote:
> Hi There,
> 
> We are evaluating Flink streaming for real time data analysis. I have my 
> flink job running in EMR with Yarn. What are the possible benchmarking tools 
> that work best with Flink? I couldn't find this information in the Apache 
> website. 
> 
> Thanks,
> Sandeep
> 
> 
> 



Flink 1.1.3 RollingSink - understanding output blocks/parallelism

2016-12-14 Thread Dominik Safaric
Hi everyone,

although this question might sound trivial, I’ve been curious about the 
following. Given a Flink topology with parallelism level set to 6 for example 
and outputting the data stream to HDFS using an instance RollingSink, how is 
the output file structured? By structured, I refer to the fact that this will 
result in 6 distinct block files, whereas I would like to have a single file 
containing all of the output values from the DataStream. 

Regards,
Dominik

Flink 1.1.3 RollingSink - mismatch in the number of records consumed/produced

2016-12-12 Thread Dominik Safaric
Hi everyone,

As I’ve implemented a RollingSink writing messages consumed from a Kafka log, 
I’ve observed that there is a significant mismatch in the number of messages 
consumed and written to file system.

Namely, the consumed Kafka topic contains in total 1.000.000 messages. The 
topology does not perform any data transformation whatsoever, but instead of, 
data from the source is pushed straight to the RollingSink. 

After I’ve checksummed the output files, I’ve observed that the total number of 
messages written to the output files is greater then 7.000.000 - a different of 
6.000.000 records more then consumed/available.

What is the cause of this behaviour? 

Regards,
Dominik   

Partitioning operator state

2016-12-07 Thread Dominik Safaric
Hi everyone,

In the case of scaling out a Flink cluster, how does Flink handle operator 
state partitioning of a staged topology? 

Regards,
Dominik 



Cannot connect to the JobManager - Flink 1.1.3 cluster mode

2016-11-23 Thread Dominik Safaric
Hi all,

As I’ve been setting up a cluster comprised of three worker nodes and a master 
node, I’ve encountered the problem that the JobManager although running is 
unreachable. 

The master instance has access using SSH to all worker nodes. The worker nodes 
do not however have access via SSH to the master node. Hence, could this be the 
reason for the exception being thrown? Interestedly, I keep getting the same 
exception even when running as a local cluster. If I try to connect to the 
JobManager manually by executing for example bin/flink list I am however able 
to connect to the JobManager. 

In regard to other services, such as the state backend configured via 
Zookeeper, the master is able to connect to e.g. Zookeeper running on a 
different node of the cluster - checked by examining the ZNode created. 

Next, Flink imposes this requirement of SSH when running in cluster mode.  
Since the cluster I am running has a VNET configured, could SSH be bypassed or 
is it a must? 

Thanks in advance,
Dominik

Re: Flink Material & Papers

2016-11-21 Thread Dominik Safaric
Hi Hanna,

I would certainly recommend if you haven’t so far to check the official docs of 
Flink at flink.apache.org. The documentation is comprehensive and 
understandable. 

From that point, I would recommend the following blog posts and academic 
papers: 

 Apache Flink: Stream and Batch Processing in a Single Engine - 
http://sites.computer.org/debull/A15dec/p28.pdf
Lightweight Asynchronous Snapshots for Distributed Dataflows - 
https://arxiv.org/pdf/1506.08603v1.pdf
https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html
https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html

In addition to, I would suggest you to read the Realtime Data Processing at 
Facebook paper describing some of the important characteristics of stream 
processing engines generally applicable to Flink as well. 

Regards,
Dominik

> On 21 Nov 2016, at 17:53, Hanna Prinz  wrote:
> 
> Guten Abend everyone,
> 
> I’m currently writing a term paper about Flink at the HTW Berlin and I wanted 
> to ask you if you can help with papers (or other material) about Flink. I 
> could also come over to the TU if someone's doing a lecture about Flink.
> 
> And now that I’m writing you: I accidentally ran into this guide when I 
> wanted to implement a demo for my presentation (which I know now is meant for 
> development on the Flink Core): 
> https://ci.apache.org/projects/flink/flink-docs-master/internals/ide_setup.html#intellij-idea
>  
> 
> But anyway, I wanted to tell you that the Scala Compiler Plugin can’t be 
> installed like instructed because there’s no „Install Jetbrains Plugin…“ in 
> the Dialog (see screenshot attached).
> I’m using IntelliJ IDEA 2016.2.5, Build #IU-162.2228.15, built on October 14, 
> 2016 on macOS 10.12.1.
> 
> Many thanks in advance!
> Cheers
> Hanna
> 
> 



Running the JobManager and TaskManager on the same node in a cluster

2016-11-16 Thread Dominik Safaric
Hi,

It is generally recommended for streaming engines, also including Flink to run 
a separate master node - in the case of Flink, the JobManager. 

However, why should one in Flink run the JobManager on a separate node? 

Performance wise, the JobManager isn’t intense unlike of course TaskManagers. 

In terms of fault tolerance and a failing JobManager, semantically there is no 
difference. 

Hence, what are the main reasons behind this rationale? 

Thanks in advance.  

TaskManager log thread

2016-11-11 Thread Dominik Safaric
If taskmanager.debug.memory.startLogThread is set to true, where does the task 
manager output the logs to? 

Unfortunately I couldn’t find this information in the documentation, hence the 
question.

Thanks in advance,
Dominik

Re: Flink Kafka 0.10.0.0 connector

2016-11-03 Thread Dominik Safaric
Hi Robert,

Thanks for sharing this insight. 

However, the Flink Kafka 010 connector is only compatible with the 
1.2-SNAPSHOT. 

Despite that, I’ve managed to get the Flink Kafka 09 use the Kafka version 
0.10.0.1 Only minor changes to the test code had to be made, mostly in regard 
to Zookeeper utilities. 

Thanks for your help though!
Domini

> On 3 Nov 2016, at 13:59, Robert Metzger <rmetz...@apache.org> wrote:
> 
> Hi,
> I just tried the Kafka 0.10 connector again, and I could not reproduce the 
> issue you are reporting.
> 
> This is my test job:
> 
> // parse input arguments
> final ParameterTool parameterTool = ParameterTool.fromArgs(args);
> 
> if(parameterTool.getNumberOfParameters() < 4) {
>System.out.println("Missing parameters!\nUsage: Kafka --topic  " +
>  "--bootstrap.servers  --zookeeper.connect  
> --group.id <http://group.id/> ");
>return;
> }
> 
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.getConfig().disableSysoutLogging();
> env.getConfig().setRestartStrategy(RestartStrategies.fixedDelayRestart(4, 
> 1));
> env.enableCheckpointing(5000); // create a checkpoint every 5 secodns
> env.getConfig().setGlobalJobParameters(parameterTool); // make parameters 
> available in the web interface
> 
> DataStream messageStream = env
>   .addSource(new FlinkKafkaConsumer010<>(
> parameterTool.getRequired("topic"),
> new SimpleStringSchema(),
> parameterTool.getProperties()));
> 
> // write kafka stream to standard out.
> messageStream.print();
> 
> env.execute("Read from Kafka example");
> 
> On Thu, Nov 3, 2016 at 1:48 PM, Dominik Safaric <dominiksafa...@gmail.com 
> <mailto:dominiksafa...@gmail.com>> wrote:
> Hi Robert,
> 
>> I think the easiest way to get Kafka 0.10 running with Flink is to use the 
>> Kafka 0.10 connector in the current Flink master.
> 
> 
> Well, I’ve already builded the Kafka 0.10 connector from the master, but 
> unfortunately I keep getting the error of the type checker that the type of 
> the FlinkKafkaConsumer10 and the one required by StreamExecutionEnvironment 
> are not compatible - that is, addSource requires a subclass of the 
> SourceFunction, whereas the instance of the FlinkKafkaConsumer10 class is 
> of type FlinkKafkaConsumer10. 
> 
> Which I find quite strange because I would assume that the FlinkKafkaConsumer 
> instance should be of type SourceFunction. However, the same even happened 
> while building the FlinkKafkaConsumer09. 
> 
> Any hint what might be going on?
> 
> I’ve build the jar distribution as a clean maven package (without running the 
> tests). 
> 
> Thanks,
> Dominik
> 
>> On 3 Nov 2016, at 13:29, Robert Metzger <rmetz...@apache.org 
>> <mailto:rmetz...@apache.org>> wrote:
>> 
>> Hi Dominik,
>> 
>> Some of Kafka's APIs changed between Kafka 0.9 and 0.10, so you can not 
>> compile the Kafka 0.9 against Kafka 0.10 dependencies.
>> 
>> I think the easiest way to get Kafka 0.10 running with Flink is to use the 
>> Kafka 0.10 connector in the current Flink master.
>> You can probably copy the connector's code into your own project and use the 
>> new connector from there.
>> 
>> Regards,
>> Robert
>> 
>> 
>> On Thu, Nov 3, 2016 at 8:05 AM, Dominik Safaric <dominiksafa...@gmail.com 
>> <mailto:dominiksafa...@gmail.com>> wrote:
>> Dear all,
>> 
>> Although the Flink 1.2 version will rollout a Flink Kafka 0.10.x connector, 
>> I want to use the Flink 0.9 connector in conjunction with the 0.10.x 
>> versions. 
>> 
>> The reason behind this is because we are currently evaluating Flink part of 
>> an empirical research, hence a stable release is required. In addition, the 
>> reason why we have the requirement of using the Kafka 0.10.x versions is 
>> because since the 0.10.0 Kafka supports consumer and producer interceptors 
>> and message timestamps.
>> 
>> To make the 0.9 connector support Kafka version e.g. 0.10.0 for example, so 
>> far I’ve changed the Flink Kafka 0.9 connector dependency to the required 
>> Kafka version and build the project. However, as I imported the jar and 
>> added the source to the StreamExecutionEnvironment a type error occurred 
>> stating that the addSource function requires a class deriving from the 
>> SourceFunction interface. 
>> 
>> Hence, what have gone wrong during the build? I assume a dependency issue.
>> 
>> Next, I’ve tried just simply overriding the dependencies of the Flink Kafka 
>> connector within the project pom.xml, however there is obviously a slight 
>> API mismatch hence this cannot be done. 
>> 
>> I would really appreciate if anyone could provide some guidance once how to 
>> successfully build the Flink Kafka connector supporting Kafka 0.10.x 
>> versions. 
>> 
>> Thanks in advance,
>> Dominik 
>> 
> 
> 



Re: Flink Kafka 0.10.0.0 connector

2016-11-03 Thread Dominik Safaric
Hi Robert,

> I think the easiest way to get Kafka 0.10 running with Flink is to use the 
> Kafka 0.10 connector in the current Flink master.


Well, I’ve already builded the Kafka 0.10 connector from the master, but 
unfortunately I keep getting the error of the type checker that the type of the 
FlinkKafkaConsumer10 and the one required by StreamExecutionEnvironment are not 
compatible - that is, addSource requires a subclass of the SourceFunction, 
whereas the instance of the FlinkKafkaConsumer10 class is of type 
FlinkKafkaConsumer10. 

Which I find quite strange because I would assume that the FlinkKafkaConsumer 
instance should be of type SourceFunction. However, the same even happened 
while building the FlinkKafkaConsumer09. 

Any hint what might be going on?

I’ve build the jar distribution as a clean maven package (without running the 
tests). 

Thanks,
Dominik

> On 3 Nov 2016, at 13:29, Robert Metzger <rmetz...@apache.org> wrote:
> 
> Hi Dominik,
> 
> Some of Kafka's APIs changed between Kafka 0.9 and 0.10, so you can not 
> compile the Kafka 0.9 against Kafka 0.10 dependencies.
> 
> I think the easiest way to get Kafka 0.10 running with Flink is to use the 
> Kafka 0.10 connector in the current Flink master.
> You can probably copy the connector's code into your own project and use the 
> new connector from there.
> 
> Regards,
> Robert
> 
> 
> On Thu, Nov 3, 2016 at 8:05 AM, Dominik Safaric <dominiksafa...@gmail.com 
> <mailto:dominiksafa...@gmail.com>> wrote:
> Dear all,
> 
> Although the Flink 1.2 version will rollout a Flink Kafka 0.10.x connector, I 
> want to use the Flink 0.9 connector in conjunction with the 0.10.x versions. 
> 
> The reason behind this is because we are currently evaluating Flink part of 
> an empirical research, hence a stable release is required. In addition, the 
> reason why we have the requirement of using the Kafka 0.10.x versions is 
> because since the 0.10.0 Kafka supports consumer and producer interceptors 
> and message timestamps.
> 
> To make the 0.9 connector support Kafka version e.g. 0.10.0 for example, so 
> far I’ve changed the Flink Kafka 0.9 connector dependency to the required 
> Kafka version and build the project. However, as I imported the jar and added 
> the source to the StreamExecutionEnvironment a type error occurred stating 
> that the addSource function requires a class deriving from the SourceFunction 
> interface. 
> 
> Hence, what have gone wrong during the build? I assume a dependency issue.
> 
> Next, I’ve tried just simply overriding the dependencies of the Flink Kafka 
> connector within the project pom.xml, however there is obviously a slight API 
> mismatch hence this cannot be done. 
> 
> I would really appreciate if anyone could provide some guidance once how to 
> successfully build the Flink Kafka connector supporting Kafka 0.10.x 
> versions. 
> 
> Thanks in advance,
> Dominik 
> 



Flink Kafka 0.10.0.0 connector

2016-11-03 Thread Dominik Safaric
Dear all,

Although the Flink 1.2 version will rollout a Flink Kafka 0.10.x connector, I 
want to use the Flink 0.9 connector in conjunction with the 0.10.x versions. 

The reason behind this is because we are currently evaluating Flink part of an 
empirical research, hence a stable release is required. In addition, the reason 
why we have the requirement of using the Kafka 0.10.x versions is because since 
the 0.10.0 Kafka supports consumer and producer interceptors and message 
timestamps.

To make the 0.9 connector support Kafka version e.g. 0.10.0 for example, so far 
I’ve changed the Flink Kafka 0.9 connector dependency to the required Kafka 
version and build the project. However, as I imported the jar and added the 
source to the StreamExecutionEnvironment a type error occurred stating that the 
addSource function requires a class deriving from the SourceFunction interface. 

Hence, what have gone wrong during the build? I assume a dependency issue.

Next, I’ve tried just simply overriding the dependencies of the Flink Kafka 
connector within the project pom.xml, however there is obviously a slight API 
mismatch hence this cannot be done. 

I would really appreciate if anyone could provide some guidance once how to 
successfully build the Flink Kafka connector supporting Kafka 0.10.x versions. 

Thanks in advance,
Dominik