Re: [Architecture] Unifying CEP and BAM Languages

Maninda Edirisooriya Tue, 06 May 2014 10:13:54 -0700

Hi Srinath,

I have some questions (inline)


On Tue, May 6, 2014 at 2:18 AM, Srinath Perera <srin...@wso2.com> wrote:

> Hi All,
>
> I have thought about Paul's idea at the product council to unify CEP and
> BAM Languages. Look like it can work.
>
> 1. Users can write all the queries using Siddhi language
>
That is good. But we have to figure out how to state the data store
specific properties like custom indexes with Siddhi. Are we going to extend
Siddhi such a way?
Otherwise we may need to maintain another configurations to keep data store
specific information.

> 2. If the windows defined in queries are large (e.g. say more than 15
> minutes for batch windows or more than 15 minutes slides for sliding
> windows), the system will automatically generate Hive scripts and run the
> scripts in MapReduce.
>
We have to think how to define this switch limit is 15 minutes. Can we
intelligently select this limit, estimating the memory footprint it would
take? In some cases users without this knowledge of internal switch (from
in-memory to batch) may be amused by the sudden performance drop. And will
write scripts inefficiently. We should try to mitigate it at least.

> 3. If not queries get executed via CEP
> 4. Incoming data can marge with data stored (e.g. Database or a flat
> file), via event tables. We will have to do some work to make it work
> seamlessly.
>
Does flat files mean, some files in HDFS or something else? What do you
mean by merging? Are we going to merge in memory data with data base data
virtually?

> 5. If you combine smaller and larger windows, system should work using CEP
> and MapReduce side by side.
>
Are we running CEP engines as hadoop map reduce jobs or CEP engines as Hive
jobs or are we going to start hadoop jobs with CEP events? If last is the
case how can we merge the results from CEP and haddop. But map-reduce jobs
are asynchronous though CEP execution is synchronous AFAIK. So should we
maintain an barrier kind of implementation to find the point where both
executions are completed, when CEP and map-reduce jobs are running
concurrently?

>
> As far as I can tell, anything can be done with Hive script can be done
> with Siddhi language.
>
>
>  BAM
>
> CEP
>
> Retrieve All
>
> from S1
>
> Retrive Some
>
> from S1[condition]
>
> Projection
>
> from .. select
>
> Sort
>
> Have to implement via sort window
>
> Group By
>
> via partitions or via group by
>
> transform
>
> transform function TBD
>
> Join
>
> Join with right windows
>
> Union
>
> ?
>
> Map/Reduce
>
> parition + queries => map
>
> send results to one stream, process  => reduce
>
>  There are few builtin functions missing like sin() .. that we can easily
> add.
>
> Pros
> ===
> One language
> Cleaner model for both batch and realtime analytics
>
> Cons
> ====
> This does not work for "data copied as flat files". Such files need to be
> replayed, which may be expensive.
>
> Thoughts please. Would that work?
>
> Thanks
> Srinath
>
> --
> ============================
> Srinath Perera, Ph.D.
>   Director, Research, WSO2 Inc.
>   Visiting Faculty, University of Moratuwa
>   Member, Apache Software Foundation
>   Research Scientist, Lanka Software Foundation
>   Blog: http://srinathsview.blogspot.com/
>   Photos: http://www.flickr.com/photos/hemapani/
>    Phone: 0772360902
>
> _______________________________________________
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>

_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] Unifying CEP and BAM Languages

Reply via email to