Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

2019-09-06 Thread Kenny Gorman
Grats Kostas! ^5^5

 

-kg

 

From: Fabian Hueske 
Date: Friday, September 6, 2019 at 7:56 AM
To: dev , user 
Subject: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

 

Hi everyone,

 

I'm very happy to announce that Kostas Kloudas is joining the Flink PMC.

Kostas is contributing to Flink for many years and puts lots of effort in 
helping our users and growing the Flink community.

 

Please join me in congratulating Kostas!

 

Cheers,

Fabian



Re: Read mongo datasource in Flink

2019-04-29 Thread Kenny Gorman
Just a thought, A robust and high performance way to potentially achieve your 
goals is:

Debezium->Kafka->Flink

https://debezium.io/docs/connectors/mongodb/ 
<https://debezium.io/docs/connectors/mongodb/>

Good robust handling of various topologies, reasonably good scaling properties, 
good restart-ability and such..

Thanks
Kenny Gorman
Co-Founder and CEO
www.eventador.io <http://www.eventador.io/>



> On Apr 29, 2019, at 7:47 AM, Wouter Zorgdrager  
> wrote:
> 
> Yes, that is correct. This is a really basic implementation that doesn't take 
> parallelism into account. I think you need something like this [1] to get 
> that working.
> 
> [1]: 
> https://docs.mongodb.com/manual/reference/command/parallelCollectionScan/#dbcmd.parallelCollectionScan
>  
> <https://docs.mongodb.com/manual/reference/command/parallelCollectionScan/#dbcmd.parallelCollectionScan>
> Op ma 29 apr. 2019 om 14:37 schreef Flavio Pompermaier  <mailto:pomperma...@okkam.it>>:
> But what about parallelism with this implementation? From what I see there's 
> only a single thread querying Mongo and fetching all the data..am I wrong?
> 
> On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager  <mailto:w.d.zorgdra...@tudelft.nl>> wrote:
> For a framework I'm working on, we actually implemented a (basic) Mongo 
> source [1]. It's written in Scala and uses Json4s [2] to parse the data into 
> a case class. It uses a Mongo observer to iterate over a collection and emit 
> it into a Flink context. 
> 
> Cheers,
> Wouter
> 
> [1]: 
> https://github.com/codefeedr/codefeedr/blob/develop/codefeedr-plugins/codefeedr-mongodb/src/main/scala/org/codefeedr/plugins/mongodb/BaseMongoSource.scala
>  
> <https://github.com/codefeedr/codefeedr/blob/develop/codefeedr-plugins/codefeedr-mongodb/src/main/scala/org/codefeedr/plugins/mongodb/BaseMongoSource.scala>
>  
> [2]: http://json4s.org/ <http://json4s.org/>
> Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier  <mailto:pomperma...@okkam.it>>:
> I'm not aware of an official source/sink..if you want you could try to 
> exploit the Mongo HadoopInputFormat as in [1].
> The provided link use a pretty old version of Flink but it should not be a 
> big problem to update the maven dependencies and the code to a newer version.
> 
> Best,
> Flavio
> 
> [1] https://github.com/okkam-it/flink-mongodb-test 
> <https://github.com/okkam-it/flink-mongodb-test>
> On Mon, Apr 29, 2019 at 6:15 AM Hai  <mailto:h...@magicsoho.com>> wrote:
> Hi,
> 
> Can anyone give me a clue about how to read mongodb’s data as a 
> batch/streaming datasource in Flink? I don’t find the mongodb connector in 
> recent release version .
> 
> Many thanks
> 
> 



Re: Flink vs Spark streaming benchmark

2017-12-17 Thread Kenny Gorman
 Nice write up!

-kg

> On Dec 16, 2017, at 4:59 PM, Fabian Hueske  wrote:
> 
> Hi,
> 
> In case you haven't seen it yet.
> Here's an analysis and response to Databricks' benchmark [1].
> 
> Best, Fabian
> 
> [1] 
> https://data-artisans.com/blog/curious-case-broken-benchmark-revisiting-apache-flink-vs-databricks-runtime
> 
> 2017-11-13 11:44 GMT+01:00 G.S.Vijay Raajaa :
>> Hi Guys,
>> 
>> I have been using Flink for quite sometime now and recently I hit upon a 
>> benchmark result that was published in Data bricks.
>> 
>> Would love to hear your thoughts - 
>> https://databricks.com/blog/2017/10/11/benchmarking-structured-streaming-on-databricks-runtime-against-state-of-the-art-streaming-systems.html
>> 
>> Regards,
>> Vijay Raajaa G S
> 


Re: Stumped writing to KafkaJSONSink

2017-10-18 Thread Kenny Gorman
Yep we hung out and got it working. I should have replied sooner! Thx for the 
reply.

-kg

> On Oct 18, 2017, at 7:06 AM, Fabian Hueske  wrote:
> 
> Hi Kenny,
> 
> this look almost correct. 
> The Table class has a method writeToSink(TableSink) that should address your 
> use case (so the same as yours but without the TableEnvironment argument).
> 
> Does that work for you?
> If not what kind of error and error message do you get?
> 
> Best, Fabian
> 
> 2017-10-18 1:28 GMT+02:00 Kenny Gorman :
>> I am hoping you guys can help me. I am stumped how to actually write to 
>> Kafka using Kafka09JsonTableSink using the Table API. Here is my code below, 
>> I am hoping you guys can shed some light on how this should be done. I don’t 
>> see any methods for the actual write to Kafka. I am probably doing something 
>> stupid. TIA.
>> 
>> Thanks!
>> Kenny
>> 
>> // run some SQL to filter results where a key is not null
>> String sql = "SELECT icao FROM flights WHERE icao is not null";
>> tableEnv.registerTableSource("flights", kafkaTableSource);
>> Table result = tableEnv.sql(sql);
>> 
>> // create a partition for the data going into kafka
>> FlinkFixedPartitioner partition =  new FlinkFixedPartitioner();
>> 
>> // create new tablesink of JSON to kafka
>> KafkaJsonTableSink kafkaTableSink = new Kafka09JsonTableSink(
>> params.getRequired("write-topic"),
>> params.getProperties(),
>> partition);
>> 
>> result.writeToSink(tableEnv, kafkaTableSink);  // Logically, I want to do 
>> this, but no such method..
> 


Stumped writing to KafkaJSONSink

2017-10-17 Thread Kenny Gorman
I am hoping you guys can help me. I am stumped how to actually write to Kafka 
using Kafka09JsonTableSink using the Table API. Here is my code below, I am 
hoping you guys can shed some light on how this should be done. I don’t see any 
methods for the actual write to Kafka. I am probably doing something stupid. 
TIA.

Thanks!
Kenny

// run some SQL to filter results where a key is not null
String sql = "SELECT icao FROM flights WHERE icao is not null";
tableEnv.registerTableSource("flights", kafkaTableSource);
Table result = tableEnv.sql(sql);

// create a partition for the data going into kafka
FlinkFixedPartitioner partition =  new FlinkFixedPartitioner();

// create new tablesink of JSON to kafka
KafkaJsonTableSink kafkaTableSink = new Kafka09JsonTableSink(
params.getRequired("write-topic"),
params.getProperties(),
partition);

result.writeToSink(tableEnv, kafkaTableSink);  // Logically, I want to do this, 
but no such method..