Milinda,

I see that the document you mentioned addresses windowing but I also need
to group by different applications.

Application    Count
---------------    --------
A                    100
B                    40
C                    69
....

- Shekar

On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctip...@gmail.com> wrote:

> Never mind. I see it here:
>
> http://samza.apache.org/learn/documentation/0.8/container/windowing.html
>
> Thanks again Milinda.
>
> - Shekar
>
> On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctip...@gmail.com> wrote:
>
>> Thanks Milinda.
>> Is this feature available on 0.8 version of Samza?
>>
>> - Shekar
>>
>> On Fri, Jun 26, 2015 at 11:24 AM, Milinda Pathirage <
>> mpath...@umail.iu.edu> wrote:
>>
>>> Hi Shekar,
>>>
>>> You can use Samza's local storage (
>>>
>>> http://samza.apache.org/learn/documentation/0.9/container/state-management.html
>>> )
>>> to keep the window state and windowing (
>>> http://samza.apache.org/learn/documentation/0.9/container/windowing.html
>>> )
>>> capabilities to handle the window advancement. During advancement you can
>>> update the local cache (Redis in your case). AFAIK, Samza doesn't provide
>>> any helpers or utilities to handle window state maintenance. You have to
>>> implement it on top of local storage or if you don't won't fault
>>> tolerance
>>> you can keep the state in-memory too (as long as the state fit in
>>> memory).
>>>
>>> Thanks
>>> Milinda
>>>
>>> On Fri, Jun 26, 2015 at 1:53 PM, Shekar Tippur <ctip...@gmail.com>
>>> wrote:
>>>
>>> > Yan,
>>> >
>>> >
>>> > *What do you mean by "a local cache"? Is it a db like MySQL, something
>>> > likeRocksDB, or even just in-memory?*
>>> >
>>> > Local cache as in Redis
>>> >
>>> >
>>> >
>>> > *When you say "another topic", is this the topic consumed by the same
>>> > Samzajob as your 5-minutes-job, or in a separate job? What is the
>>> > relationbetween the topic and the application name*
>>> >
>>> > We dont have a 5 min job. All we have now is a stream of events coming
>>> from
>>> > a bunch of applications. All these land on a raw kafka topic. The
>>> stream
>>> > data has application name. I want to create a job that takes incoming
>>> > stream and group it by application name and count the number of events
>>> we
>>> > get in a 5 min sliding window.
>>> >
>>> > - Shekar
>>> >
>>> > On Fri, Jun 26, 2015 at 10:29 AM, Yan Fang <yanfang...@gmail.com>
>>> wrote:
>>> >
>>> > > Hi Shekar,
>>> > >
>>> > > Need a little more clarification.
>>> > >
>>> > > What do you mean by "a local cache"? Is it a db like MySQL, something
>>> > like
>>> > > RocksDB, or even just in-memory?
>>> > >
>>> > > When you say "another topic", is this the topic consumed by the same
>>> > Samza
>>> > > job as your 5-minutes-job, or in a separate job? What is the relation
>>> > > between the topic and the application name?
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Fang, Yan
>>> > > yanfang...@gmail.com
>>> > >
>>> > > On Fri, Jun 26, 2015 at 1:08 AM, Shekar Tippur <ctip...@gmail.com>
>>> > wrote:
>>> > >
>>> > > > Hello,
>>> > > > My apologies if I have raised it earlier.
>>> > > > Here is the use case:
>>> > > > I have a stream that is partitioned based on application name. I
>>> want
>>> > to
>>> > > be
>>> > > > able to count hte number of events happening for that particular
>>> > > > application in the past 5 minutes (sliding window) and update
>>> either
>>> > > > another topic or a local cache.
>>> > > >
>>> > > > Is this possible via 0.9 version of Samza?
>>> > > > If not, what is the easiest way to achieve this?
>>> > > >
>>> > > > - Shekar
>>> > > >
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> Milinda Pathirage
>>>
>>> PhD Student | Research Assistant
>>> School of Informatics and Computing | Data to Insight Center
>>> Indiana University
>>>
>>> twitter: milindalakmal
>>> skype: milinda.pathirage
>>> blog: http://milinda.pathirage.org
>>>
>>
>>
>

Reply via email to