Thanks Milinda. Is this feature available on 0.8 version of Samza? - Shekar
On Fri, Jun 26, 2015 at 11:24 AM, Milinda Pathirage <mpath...@umail.iu.edu> wrote: > Hi Shekar, > > You can use Samza's local storage ( > > http://samza.apache.org/learn/documentation/0.9/container/state-management.html > ) > to keep the window state and windowing ( > http://samza.apache.org/learn/documentation/0.9/container/windowing.html) > capabilities to handle the window advancement. During advancement you can > update the local cache (Redis in your case). AFAIK, Samza doesn't provide > any helpers or utilities to handle window state maintenance. You have to > implement it on top of local storage or if you don't won't fault tolerance > you can keep the state in-memory too (as long as the state fit in memory). > > Thanks > Milinda > > On Fri, Jun 26, 2015 at 1:53 PM, Shekar Tippur <ctip...@gmail.com> wrote: > > > Yan, > > > > > > *What do you mean by "a local cache"? Is it a db like MySQL, something > > likeRocksDB, or even just in-memory?* > > > > Local cache as in Redis > > > > > > > > *When you say "another topic", is this the topic consumed by the same > > Samzajob as your 5-minutes-job, or in a separate job? What is the > > relationbetween the topic and the application name* > > > > We dont have a 5 min job. All we have now is a stream of events coming > from > > a bunch of applications. All these land on a raw kafka topic. The stream > > data has application name. I want to create a job that takes incoming > > stream and group it by application name and count the number of events we > > get in a 5 min sliding window. > > > > - Shekar > > > > On Fri, Jun 26, 2015 at 10:29 AM, Yan Fang <yanfang...@gmail.com> wrote: > > > > > Hi Shekar, > > > > > > Need a little more clarification. > > > > > > What do you mean by "a local cache"? Is it a db like MySQL, something > > like > > > RocksDB, or even just in-memory? > > > > > > When you say "another topic", is this the topic consumed by the same > > Samza > > > job as your 5-minutes-job, or in a separate job? What is the relation > > > between the topic and the application name? > > > > > > Thanks, > > > > > > Fang, Yan > > > yanfang...@gmail.com > > > > > > On Fri, Jun 26, 2015 at 1:08 AM, Shekar Tippur <ctip...@gmail.com> > > wrote: > > > > > > > Hello, > > > > My apologies if I have raised it earlier. > > > > Here is the use case: > > > > I have a stream that is partitioned based on application name. I want > > to > > > be > > > > able to count hte number of events happening for that particular > > > > application in the past 5 minutes (sliding window) and update either > > > > another topic or a local cache. > > > > > > > > Is this possible via 0.9 version of Samza? > > > > If not, what is the easiest way to achieve this? > > > > > > > > - Shekar > > > > > > > > > > > > > -- > Milinda Pathirage > > PhD Student | Research Assistant > School of Informatics and Computing | Data to Insight Center > Indiana University > > twitter: milindalakmal > skype: milinda.pathirage > blog: http://milinda.pathirage.org >