Joining two kafka streams

2017-01-08 Thread igor.berman
Hi,
I have usecase when I need to join two kafka topics together by some fields. 
In general, I could put content of one topic into another, and partition by
same key, but I can't touch those two topics(i.e. there are other consumers
from those topics), on the other hand it's essential to process same keys at
same "thread" to achieve locality and not to get races when working with
same key from different machines/threads

my idea is to use union of two streams and then key by the field,
but is there better approach to achieve "locality"?

any inputs will be appreciated
Igor



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Joining-two-kafka-streams-tp10912.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Accessing StateBackend snapshots outside of Flink

2016-04-14 Thread igor.berman
Hi,
we are evaluating Flink for new solution and several people raised concern
of coupling too much to Flink - 
1. we understand that if we want to get full fault tolerance and best
performance we'll need to use Flink managed state(probably RocksDB backend
due to volume of state)
2. but then if we latter find that Flink doesn't answer our needs(for any
reason) - we'll need to extract this state in some way(since it's the only
source of consistent state)
In general I'd like to be able to take snapshot of backend and try to read
it...do you think it's will be trivial task?
say If I'm holding list state per partitioned key, would it be easy to take
RocksDb file and open it?

any thoughts regarding how can I convince people in our team?

thanks in advance!



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Accessing-StateBackend-snapshots-outside-of-Flink-tp6116.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Flink event processing immediate feedback

2016-04-07 Thread igor.berman
Hi,
Suppose I have web facing frontend that gets stream of events(http calls). I
need to process event stream and do some aggregations over those events and
write aggregated statistics to Hbase - so far Flink seems as perfect match.
However in some cases event should trigger some alert and frontend needs to
get this alert in synchronous way - here I'm a bit lost. I thought about
some kind of following flow:
frontend -> queue -> flink -> redis(pub/sub)<- frontend

I.e. I have two major use cases - async aggregated analytics/stats computing
and "synchronous" response to frontend. Frontend might be node/play or any
other technology that won't have a problem of "waiting" for the response, so
the only question - how to implement this feedback ?
Might be some kind of Sink?

Any ideas would be appreciated,
Igor




--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-event-processing-immediate-feedback-tp5978.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Usecase for Flink

2015-12-19 Thread igor.berman
thanks Stephan,
yes, you got usecase right



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Usecase-for-Flink-tp4076p4092.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Usecase for Flink

2015-12-17 Thread igor.berman
Hi,
We are looking at Flink and trying to understand if our usecase is relevant
to it.

We need process stream of events. Each event is for some id(e.g. device id),
when each event should be 
1. stored in some persistent storage(e.g. cassandra)
2. previously persisted events should be fetched and some computation over
whole history may or may not trigger some other events(e.g. sending email)

so yes we have stream of events, but we need persistent store(aka external
service) in the middle
and there is no aggregation of those events into something smaller which
could be stored in memory, i.e. number of ids might be huge and previous
history of events per each id can be considerable so that no way to store
everything in memory

I was wondering if akka stream is sort of optional solution too

please share your ideas :)
thanks in advance,
Igor



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Usecase-for-Flink-tp4076.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.