Hi,

I wonder if there are details about the new key/value store Samza provides? 
Especially the design and how it handles scale, consistency guarantees etc.

Tim

On Aug 31, 2013, at 12:56 PM, Alex The Rocker <[email protected]> wrote:

> Chris,
> 
> Thanks you very much for your detailed.
> Another system for processing real-time data just came to my attention
> (thanks to Kafka mailing list, again).
> It's called Druid (more at: http://druid.io).
> 
> While I now understand Samza advantages over Storm for building a CEP, I am
> wondering how Samza compares to Druid.
> I guess I may not alone wondering about Samza vs. Druid, so you may want to
> add a Samza vs. Druid" item in Samza documenation :)
> 
> Thanks,
> Alex.
> 
> 
> 
> 
> On Sun, Aug 25, 2013 at 5:26 PM, Chris Riccomini 
> <[email protected]>wrote:
> 
>> Hey Alex,
>> 
>> As I understand it, the CEP pattern you describing is, "look for a series
>> of events within some bounded time frame, and take an action based on the
>> combination of events." You use an example of three events arriving within
>> 10 minutes of each other, consecutively. Wikipedia uses a similar example
>> (wedding bell event + man in suit event + woman in white dress event +
>> rice thrown event = wedding) on their CEP page.
>> 
>> This pattern can be implemented in Samza fairly easily using Samza's
>> key/value store (or some other StorageEngine, if you choose to implement
>> it). It's best to use a key/value store for this use case, since the
>> window might be quite long (10 minutes), and all events in the window
>> might not fit in memory. If you use Samza's key/value store, you can put
>> each message (and a timestamp) into the key/value store as the messages
>> arrive. You can then implement the WindowableTask interface along with the
>> StreamTask interface, and configure Samza to call window() on your task
>> every N seconds (say, task.window.ms=60000). The window method could then
>> do a range query on the key/value store, and check for message chains
>> (e.g. E1 -> E2 -> E3) that were last updated > 10 minutes ago. If an
>> expected message was missing, you could then take some action (send an
>> alert, or whatever).
>> 
>> In general, when I think CEP, I think Esper (http://esper.codehaus.org/).
>> You should be able to implement a lot of CEP/SQL type commands (SELECT,
>> JOIN, COUNT, SUM, DISTINCT, WHERE, GROUP BY, HAVING, WINDOW, ORDER, etc)
>> using Samza's StreamTask interface, and is state management facilities.
>> 
>> Beyond state management, most features in Samza enable CEP processing, in
>> one way or another. From your perspective, you can look at Samza as the
>> underlying framework with which you might choose to implement a CEP type
>> system (think MapReduce is to Hive as Samza is to a CEP system). Specific
>> things that help are its WindowableTask interface, the partitioning model
>> (which lends itself to distributed joins and aggregation), and Samza's
>> state management features.
>> 
>> One thing to be aware of right now is Samza's "at least once" messaging
>> guarantee when failures occur (inherited from Kafka). You might receive
>> duplicate messages. This means you can potentially double count, if you're
>> doing aggregation. In the example you give (E1, E2, E3), this shouldn¹t be
>> a problem. We have plans to provide exactly once messaging, but we haven't
>> implemented the feature yet.
>> 
>> Cheers,
>> Chris
>> 
>> On 8/24/13 12:05 PM, "Alex The Rocker" <[email protected]> wrote:
>> 
>>> Hello,
>>> 
>>> I just began to read about Samza, and I very excited about it (I was
>>> warned
>>> of its existence by Jay Kreps' post in Kafka users list, BTW).
>>> 
>>> My first reaction is: are you guys using it at LinkedIn for applications
>>> which lies in the CEP (Complex Event Processing) system domain?
>>> 
>>> To be more specific, would stateful Samza tasks be used in order to
>>> compute
>>> complex states such as "event E1 is followed by E2 then by E3 with less
>>> than 10 minutes interval between each event" ?
>>> 
>>> I was looking at Storm for CEP, but as pointed out in Samza Storm page,
>>> Storm leaves state management to the bolts code, whereas Samza has
>>> "something".
>>> 
>>> Beyond state management, what else would make Samza a good building block
>>> for a CEP?  Or a bad one?
>>> 
>>> Thanks,
>>> Alex.
>> 
>> 

Reply via email to