Hi all,

+1 for the idea. IMO using the *from* clause on event tables to do batch
processing makes sense and fits the current Siddhi model nicely.

For supporting the *from* clause when the timestamp is not given, can we
add a timestamp internally to the table if the event table definition does
not contain a timestamp attribute?

i.e.

- If the event table definition contains a timestamp attribute, that will
be used (If not, we let the user specify what should be picked up as the
timestamp attribute)
- If the timestamp is not available, and not specified either, we add a
timestamp internally.
- At the time of inserting the event, we check whether the timestamp is
there and if not add the timestamp at the time of inserting the event to
the table.

The timestamp attribute will have to be treated a bit specially.

I believe that this approach does have scenarios/use-cases where the
behaviour would be not as expected. (e.g. if timestamp is null or corrupted
for only a selected set of events; also setting the timestamp of the event
to the time at which it was received by the server does not make sense at
all times). Just throwing out the idea to see if it is feasible.. :-)

Thanks,
Lasantha

On 22 November 2014 at 08:00, Srinath Perera <srin...@wso2.com> wrote:

> IMO, use of window batch is consistant (My argument is, it is interpreted
> as batch process when used with event tables). But lets think more on that.
>
> --Srinath
>
> On Sat, Nov 22, 2014 at 7:56 AM, Sriskandarajah Suhothayan <s...@wso2.com>
> wrote:
>
>> +1
>> I like the idea, only worried about the use of *#window.batch()*
>> How about using *WeatherStream#pull#window.time(24h) *instead?
>>
>> Suho
>>
>> On Fri, Nov 21, 2014 at 8:33 PM, Srinath Perera <srin...@wso2.com> wrote:
>>
>>> Hi All,
>>>
>>>
>>> I think I have finally figured out a way do the Paul's idea (See the
>>> thread "Unifying CEP and BAM Languages"
>>> http://mail.wso2.org/mailarchive/architecture/2014-May/016366.html for
>>> more details).
>>>
>>>
>>> Currently event tables in Siddhi are backed by a disk and can be used to
>>> joins
>>>
>>> However, you must use it with a stream and it is triggered from a
>>> stream. For example currently *from WeatherStream insert into ..* is
>>> not defined.  *Idea is to use that for batch processing!*
>>>
>>> Proposal is to extend event tables definition
>>>
>>> 1. Add an timestamp field  to event tables (Then BAM stream also become
>>> a event table)
>>>
>>> 2. *Define “from” operation on event table to cover batch processing*.
>>> For example
>>>
>>> from EvetTable#window.batch(2h) avg(pressure) as avgPressure will run
>>> as batch job with 2h data using data in the event table. (We will execute
>>> this in Spark using Siddhi engine). If #window.batch(2h) is omitted,
>>> #window.batch(5m) is assumed.
>>>
>>> 3. You can define an event table on top of DB, disk, Cassandra, hdfs,
>>> BAM stream stored etc ..
>>>
>>> Let me take an example
>>>
>>> Say you want to calculate avg and stddev of pressure once every 24h as a
>>> batch job, and raise an alarm if current pressure is less than 3 stddev
>>> from the mean calculated in batch process. (this is a extended Lambda
>>> architecture scenario) Then you can do all this with Siddhi, and query will
>>> look like following.
>>>
>>> *define eventTable WeatherTable using BAMStream … *
>>>
>>> *define eventTable  WeatherStatsTable using BAMStream … *
>>>
>>> *//batch job*
>>>
>>> *from **WeatherTable**#window.batch(24h) stddev(pressure) , avg
>>> (pressure)*
>>>
>>> *     insert into **WeatherStatsTable**; *
>>>
>>> *//use results from batch job in realtime *
>>>
>>> *from WeatherStream as  p join **WeatherStatsTable**#window.last() as s
>>> *
>>>
>>> *          on pressure < s.mean -2*s.stddev*
>>>
>>> *     insert into WeatherAlarmStream; *
>>>
>>> First query runs once every 24 hours and calculate mean and stddev and
>>> write to disk. That value is joined against the live stream as they come in
>>> via join.
>>>
>>> Few more rules (and there are more details that we need to figure out)
>>>
>>> 1. You read from event table, then it runs as batch processes
>>>
>>> 2. If you join with event table, it works as now. However, you can
>>> define a window when joining on top of event tables as well. e.g.
>>> WeatherStats#window.batch(5m) means takes events came in last 5 mins.
>>>
>>> 3. We need to define how it behaves when timestamp field is not defined.
>>> Best is to only support *joins* and not support *from* in that case.
>>>
>>> 4. When processing batches, it runs in parallel if partitions are
>>> defined. For example, if you want to calculate mean in map reduce style, it
>>> will look like following.
>>>
>>> *define partition StationParition WeatherStream.stationID; *
>>>
>>> *from WeatherStream#window.batch(24h) avg(pressure) *
>>>
>>> *     insert into WeatherMeanStream using StationParition;  *
>>>
>>> If no partitions, it will run sequentially (We can improve on this
>>> later).
>>>
>>> For execution, we want to init siddhi within Spark, and run thing in
>>> parallel using spark, but actual evaluation will done by Siddhi.
>>>
>>> 5. Users can define other windows like sliding windows with event
>>> tables. However, Siddhi will read data from disk once every 5 minutes or
>>> so. So results will be the same, however, it might come bit later than with
>>> streams.
>>>
>>>
>>> Please comment
>>>
>>> Thanks
>>>
>>> Srinath
>>>
>>>
>>>
>>>
>>> --
>>> ============================
>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>> Site: http://people.apache.org/~hemapani/
>>> Photos: http://www.flickr.com/photos/hemapani/
>>> Phone: 0772360902
>>>
>>
>>
>>
>> --
>>
>> *S. Suhothayan*
>> Technical Lead & Team Lead of WSO2 Complex Event Processor
>>  *WSO2 Inc. *http://wso2.com
>> * <http://wso2.com/>*
>> lean . enterprise . middleware
>>
>>
>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter:
>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in:
>> http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>*
>>
>
>
>
> --
> ============================
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://people.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>
> _______________________________________________
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
*Lasantha Fernando*
Software Engineer - Data Technologies Team
WSO2 Inc. http://wso2.com

email: lasan...@wso2.com
mobile: (+94) 71 5247551
_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to