Milinda, I see that the document you mentioned addresses windowing but I also need to group by different applications.
Application Count --------------- -------- A 100 B 40 C 69 .... - Shekar On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctip...@gmail.com> wrote: > Never mind. I see it here: > > http://samza.apache.org/learn/documentation/0.8/container/windowing.html > > Thanks again Milinda. > > - Shekar > > On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctip...@gmail.com> wrote: > >> Thanks Milinda. >> Is this feature available on 0.8 version of Samza? >> >> - Shekar >> >> On Fri, Jun 26, 2015 at 11:24 AM, Milinda Pathirage < >> mpath...@umail.iu.edu> wrote: >> >>> Hi Shekar, >>> >>> You can use Samza's local storage ( >>> >>> http://samza.apache.org/learn/documentation/0.9/container/state-management.html >>> ) >>> to keep the window state and windowing ( >>> http://samza.apache.org/learn/documentation/0.9/container/windowing.html >>> ) >>> capabilities to handle the window advancement. During advancement you can >>> update the local cache (Redis in your case). AFAIK, Samza doesn't provide >>> any helpers or utilities to handle window state maintenance. You have to >>> implement it on top of local storage or if you don't won't fault >>> tolerance >>> you can keep the state in-memory too (as long as the state fit in >>> memory). >>> >>> Thanks >>> Milinda >>> >>> On Fri, Jun 26, 2015 at 1:53 PM, Shekar Tippur <ctip...@gmail.com> >>> wrote: >>> >>> > Yan, >>> > >>> > >>> > *What do you mean by "a local cache"? Is it a db like MySQL, something >>> > likeRocksDB, or even just in-memory?* >>> > >>> > Local cache as in Redis >>> > >>> > >>> > >>> > *When you say "another topic", is this the topic consumed by the same >>> > Samzajob as your 5-minutes-job, or in a separate job? What is the >>> > relationbetween the topic and the application name* >>> > >>> > We dont have a 5 min job. All we have now is a stream of events coming >>> from >>> > a bunch of applications. All these land on a raw kafka topic. The >>> stream >>> > data has application name. I want to create a job that takes incoming >>> > stream and group it by application name and count the number of events >>> we >>> > get in a 5 min sliding window. >>> > >>> > - Shekar >>> > >>> > On Fri, Jun 26, 2015 at 10:29 AM, Yan Fang <yanfang...@gmail.com> >>> wrote: >>> > >>> > > Hi Shekar, >>> > > >>> > > Need a little more clarification. >>> > > >>> > > What do you mean by "a local cache"? Is it a db like MySQL, something >>> > like >>> > > RocksDB, or even just in-memory? >>> > > >>> > > When you say "another topic", is this the topic consumed by the same >>> > Samza >>> > > job as your 5-minutes-job, or in a separate job? What is the relation >>> > > between the topic and the application name? >>> > > >>> > > Thanks, >>> > > >>> > > Fang, Yan >>> > > yanfang...@gmail.com >>> > > >>> > > On Fri, Jun 26, 2015 at 1:08 AM, Shekar Tippur <ctip...@gmail.com> >>> > wrote: >>> > > >>> > > > Hello, >>> > > > My apologies if I have raised it earlier. >>> > > > Here is the use case: >>> > > > I have a stream that is partitioned based on application name. I >>> want >>> > to >>> > > be >>> > > > able to count hte number of events happening for that particular >>> > > > application in the past 5 minutes (sliding window) and update >>> either >>> > > > another topic or a local cache. >>> > > > >>> > > > Is this possible via 0.9 version of Samza? >>> > > > If not, what is the easiest way to achieve this? >>> > > > >>> > > > - Shekar >>> > > > >>> > > >>> > >>> >>> >>> >>> -- >>> Milinda Pathirage >>> >>> PhD Student | Research Assistant >>> School of Informatics and Computing | Data to Insight Center >>> Indiana University >>> >>> twitter: milindalakmal >>> skype: milinda.pathirage >>> blog: http://milinda.pathirage.org >>> >> >> >