Milinda,
I see that the document you mentioned addresses windowing but I also need
to group by different applications.
ApplicationCount
---
A100
B40
C69
- Shekar
On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur ctip...@gmail.com wrote:
Never mind. I see it here:
http://samza.apache.org/learn/documentation/0.8/container/windowing.html
Thanks again Milinda.
- Shekar
On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur ctip...@gmail.com wrote:
Thanks Milinda.
Is this feature available on 0.8 version of Samza?
- Shekar
On Fri, Jun 26, 2015 at 11:24 AM, Milinda Pathirage
mpath...@umail.iu.edu wrote:
Hi Shekar,
You can use Samza's local storage (
http://samza.apache.org/learn/documentation/0.9/container/state-management.html
)
to keep the window state and windowing (
http://samza.apache.org/learn/documentation/0.9/container/windowing.html
)
capabilities to handle the window advancement. During advancement you can
update the local cache (Redis in your case). AFAIK, Samza doesn't provide
any helpers or utilities to handle window state maintenance. You have to
implement it on top of local storage or if you don't won't fault
tolerance
you can keep the state in-memory too (as long as the state fit in
memory).
Thanks
Milinda
On Fri, Jun 26, 2015 at 1:53 PM, Shekar Tippur ctip...@gmail.com
wrote:
Yan,
*What do you mean by a local cache? Is it a db like MySQL, something
likeRocksDB, or even just in-memory?*
Local cache as in Redis
*When you say another topic, is this the topic consumed by the same
Samzajob as your 5-minutes-job, or in a separate job? What is the
relationbetween the topic and the application name*
We dont have a 5 min job. All we have now is a stream of events coming
from
a bunch of applications. All these land on a raw kafka topic. The
stream
data has application name. I want to create a job that takes incoming
stream and group it by application name and count the number of events
we
get in a 5 min sliding window.
- Shekar
On Fri, Jun 26, 2015 at 10:29 AM, Yan Fang yanfang...@gmail.com
wrote:
Hi Shekar,
Need a little more clarification.
What do you mean by a local cache? Is it a db like MySQL, something
like
RocksDB, or even just in-memory?
When you say another topic, is this the topic consumed by the same
Samza
job as your 5-minutes-job, or in a separate job? What is the relation
between the topic and the application name?
Thanks,
Fang, Yan
yanfang...@gmail.com
On Fri, Jun 26, 2015 at 1:08 AM, Shekar Tippur ctip...@gmail.com
wrote:
Hello,
My apologies if I have raised it earlier.
Here is the use case:
I have a stream that is partitioned based on application name. I
want
to
be
able to count hte number of events happening for that particular
application in the past 5 minutes (sliding window) and update
either
another topic or a local cache.
Is this possible via 0.9 version of Samza?
If not, what is the easiest way to achieve this?
- Shekar
--
Milinda Pathirage
PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University
twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org