Hi Maninda,

Thanks for comments!! please see my answers below.

I have thought about Paul's idea at the product council to unify CEP and
>> BAM Languages. Look like it can work.
>>
>> 1. Users can write all the queries using Siddhi language
>>
> That is good. But we have to figure out how to state the data store
> specific properties like custom indexes with Siddhi. Are we going to extend
> Siddhi such a way?
> Otherwise we may need to maintain another configurations to keep data
> store specific information.
>

Does that info goes to Hive queries now? My first reaction is that info
best kept separate, but we have to discuss.



>  2. If the windows defined in queries are large (e.g. say more than 15
>> minutes for batch windows or more than 15 minutes slides for sliding
>> windows), the system will automatically generate Hive scripts and run the
>> scripts in MapReduce.
>>
> We have to think how to define this switch limit is 15 minutes. Can we
> intelligently select this limit, estimating the memory footprint it would
> take? In some cases users without this knowledge of internal switch (from
> in-memory to batch) may be amused by the sudden performance drop. And will
> write scripts inefficiently. We should try to mitigate it at least.
>

I think we have to play around and find when to switch. Initially,  this
will be a configuration option, but good if we can do some intelligent
decisions as well.



>  3. If not queries get executed via CEP
>> 4. Incoming data can marge with data stored (e.g. Database or a flat
>> file), via event tables. We will have to do some work to make it work
>> seamlessly.
>>
> Does flat files mean, some files in HDFS or something else? What do you
> mean by merging? Are we going to merge in memory data with data base data
> virtually?
>
Usecase is you have a data stream that you need to merge (Join) with a flat
file. In Hive world, flat file just another table .. but in streaming
world, you need a stream to trigger action. In real impl, we might dump all
data to disk and merge or merge in memory etc.


> 5. If you combine smaller and larger windows, system should work using CEP
>> and MapReduce side by side.
>>
> Are we running CEP engines as hadoop map reduce jobs or CEP engines as
> Hive jobs or are we going to start hadoop jobs with CEP events? If last is
> the case how can we merge the results from CEP and haddop. But map-reduce
> jobs are asynchronous though CEP execution is synchronous AFAIK. So should
> we maintain an barrier kind of implementation to find the point where both
> executions are completed, when CEP and map-reduce jobs are running
> concurrently?
>

My first design is to run CEP by itself and run Hive via MapReduce. We have
to write code to push CEP event streams in to Casandra and in some cases
and to push Hive outputs into CEP streams etc. We have to figure out the
details, but this can be done!!

--Srinath



> As far as I can tell, anything can be done with Hive script can be done
>> with Siddhi language.
>>
>>
>>  BAM
>>
>> CEP
>>
>> Retrieve All
>>
>> from S1
>>
>> Retrive Some
>>
>> from S1[condition]
>>
>> Projection
>>
>> from .. select
>>
>> Sort
>>
>> Have to implement via sort window
>>
>> Group By
>>
>> via partitions or via group by
>>
>> transform
>>
>> transform function TBD
>>
>> Join
>>
>> Join with right windows
>>
>> Union
>>
>> ?
>>
>> Map/Reduce
>>
>> parition + queries => map
>>
>> send results to one stream, process  => reduce
>>
>>  There are few builtin functions missing like sin() .. that we can easily
>> add.
>>
>> Pros
>> ===
>> One language
>> Cleaner model for both batch and realtime analytics
>>
>> Cons
>> ====
>> This does not work for "data copied as flat files". Such files need to be
>> replayed, which may be expensive.
>>
>> Thoughts please. Would that work?
>>
>> Thanks
>> Srinath
>>
>> --
>> ============================
>> Srinath Perera, Ph.D.
>>   Director, Research, WSO2 Inc.
>>   Visiting Faculty, University of Moratuwa
>>   Member, Apache Software Foundation
>>   Research Scientist, Lanka Software Foundation
>>   Blog: http://srinathsview.blogspot.com/
>>   Photos: http://www.flickr.com/photos/hemapani/
>>    Phone: 0772360902
>>
>> _______________________________________________
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>


-- 
============================
Srinath Perera, Ph.D.
  Director, Research, WSO2 Inc.
  Visiting Faculty, University of Moratuwa
  Member, Apache Software Foundation
  Research Scientist, Lanka Software Foundation
  Blog: http://srinathsview.blogspot.com/
  Photos: http://www.flickr.com/photos/hemapani/
   Phone: 0772360902
_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to