Re: Joining two kafka streams

2017-01-09 Thread Igor Berman
Hi Tzu-Li,
Huge thanks for the input, I'll try to implement prototype of your idea and
see if it answers my requirements


On 9 January 2017 at 08:02, Tzu-Li (Gordon) Tai  wrote:

> Hi Igor!
>
> What you can actually do is let a single FlinkKafkaConsumer consume from
> both topics, producing a single DataStream which you can keyBy afterwards.
> All versions of the FlinkKafkaConsumer support consuming multiple Kafka
> topics simultaneously. This is logically the same as union and then a
> keyBy, like what you described.
>
> Note that this approach requires that the records in both of your Kafka
> topics are of the same type when consumed into Flink (ex., same POJO
> classes, or simply both as Strings, etc.).
> If that isn’t possible and you have different data types / schemas for the
> topics, you’d probably need to use “connect” and then a keyBy.
>
> If you’re applying a window directly after joining the two topic streams,
> you could also use a window join:
>
> dataStream.join(otherStream)
> .where().equalTo()
> .window(TumblingEventTimeWindows.of(Time.seconds(3)))
> .apply (new JoinFunction () {...});
>
> The “where” specifies how to select the key from the first stream, and
> “equalTo” the second one.
>
> Hope this helps, let me know if you have other questions!
>
> Cheers,
> Gordon
>
> On January 9, 2017 at 4:06:34 AM, igor.berman (igor.ber...@gmail.com)
> wrote:
>
> Hi,
> I have usecase when I need to join two kafka topics together by some
> fields.
> In general, I could put content of one topic into another, and partition
> by
> same key, but I can't touch those two topics(i.e. there are other
> consumers
> from those topics), on the other hand it's essential to process same keys
> at
> same "thread" to achieve locality and not to get races when working with
> same key from different machines/threads
>
> my idea is to use union of two streams and then key by the field,
> but is there better approach to achieve "locality"?
>
> any inputs will be appreciated
> Igor
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Joining-two-
> kafka-streams-tp10912.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>
>


Re: Joining two kafka streams

2017-01-08 Thread Tzu-Li (Gordon) Tai
Hi Igor!

What you can actually do is let a single FlinkKafkaConsumer consume from both 
topics, producing a single DataStream which you can keyBy afterwards.
All versions of the FlinkKafkaConsumer support consuming multiple Kafka topics 
simultaneously. This is logically the same as union and then a keyBy, like what 
you described.

Note that this approach requires that the records in both of your Kafka topics 
are of the same type when consumed into Flink (ex., same POJO classes, or 
simply both as Strings, etc.).
If that isn’t possible and you have different data types / schemas for the 
topics, you’d probably need to use “connect” and then a keyBy.

If you’re applying a window directly after joining the two topic streams, you 
could also use a window join:
dataStream.join(otherStream)
.where().equalTo()
.window(TumblingEventTimeWindows.of(Time.seconds(3)))
.apply (new JoinFunction () {...});
The “where” specifies how to select the key from the first stream, and 
“equalTo” the second one.

Hope this helps, let me know if you have other questions!

Cheers,
Gordon

On January 9, 2017 at 4:06:34 AM, igor.berman (igor.ber...@gmail.com) wrote:

Hi,  
I have usecase when I need to join two kafka topics together by some fields.  
In general, I could put content of one topic into another, and partition by  
same key, but I can't touch those two topics(i.e. there are other consumers  
from those topics), on the other hand it's essential to process same keys at  
same "thread" to achieve locality and not to get races when working with  
same key from different machines/threads  

my idea is to use union of two streams and then key by the field,  
but is there better approach to achieve "locality"?  

any inputs will be appreciated  
Igor  



--  
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Joining-two-kafka-streams-tp10912.html
  
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.