DAS deployment might end up collecting data from multiple time zones? Is that supported? How is it handled? Does end users have to issue queries in UTC?
thoughts please --Srinath On Wed, Jun 15, 2016 at 2:11 AM, Mohanadarshan Vivekanandalingam < mo...@wso2.com> wrote: > Hi Inosh, > > Thanks for the fix.. We have incorporated this to analytics-is.. > > Regards, > Mohan > > > On Tue, Jun 14, 2016 at 7:33 PM, Inosh Goonewardena <in...@wso2.com> > wrote: > >> Hi All, >> >> We made a modification to incremental data processing syntax and >> implementation in order to avoid inconsistencies occur with timeWindows. >> >> For example, with the previous approach if the timeWindow is specified as >> 3600 (or 86400) incremental read time is getting set to the hour starting >> time (or day starting) time of UTC time. It is due to the the fact that >> incremental start time is calculated by performing modulo operation on a >> timestamp. In most of the scenarios, analytics is implemented in per <time >> unit> basis, i.e., we maintain summary tables for per minute, per hour, per >> day, per month. In these summary tables, we do windowing based on the local >> timezone of the server which DAS runs [1]. Since incremental time window >> start at UTC time (hour or day) inconsistencies happens in aggregate values >> of time windows. >> >> With the new introduced changes, incremental window start time is set >> based on local time zone to avoid aforementioned inconsistencies. Syntax is >> as follows. >> >> CREATE TEMPORARY TABLE T1 USING CarbonAnalytics options (tableName "test", >> incrementalProcessing "T1, *WindowUnit*, WindowOffset")*;* >> >> >> WindowUnit concept is similar to the previous concept of TimeWindow, but >> in here corresponding window time unit needs to be provided instead of time >> in millis. The units supported are SECOND, MINUTE, HOUR, DAY, MONTH, YEAR. >> >> For example, lets say server runs in IST time and last processed event >> time is 1461111538669 (2016/04/20 05:48:58 PM IST). If the WindowUnit is >> set as DAY and WindowOffset is set to 0 in the incremental table, next >> script execution will read data starting from 1461090600000 (2016/04/20 >> 12:00:00 AM IST). >> >> [1] [Analytics] Using UTC across all temporal analytics functions >> >> >> >> On Fri, Mar 25, 2016 at 10:59 AM, Gihan Anuruddha <gi...@wso2.com> wrote: >> >>> Actually currently in ESB-analytics also we are using a similar >>> mechanism. In there we keep 5 tables for each time unites(second, minute, >>> hour, day, month) and we have windowOffset in correspondent to it time >>> unite like in minute-60, hour- 3600 etc. We are only storing min, max, sum >>> and count values only. So at a given time if we need the average, we divide >>> the sum with count and get that. >>> >>> On Fri, Mar 25, 2016 at 8:47 AM, Inosh Goonewardena <in...@wso2.com> >>> wrote: >>> >>>> >>>> >>>> On Thu, Mar 24, 2016 at 9:56 PM, Sachith Withana <sach...@wso2.com> >>>> wrote: >>>> >>>>> Hi Inosh, >>>>> >>>>> We wouldn't have to do that IMO. >>>>> >>>>> We can persist the total aggregate value upto currentTimeWindow - >>>>> WindowOffset, along with the previous time window aggregation meta data as >>>>> well ( count, sum in the average aggregation case). >>>>> >>>> >>>> Yep. That will do. >>>> >>>>> >>>>> The previous total wouldn't be calculated again, it's the last two >>>>> time windows ( including the current one) that we need to recalculate and >>>>> add it to the previous total. >>>>> >>>>> It works almost the same way as the current incremental processing >>>>> table, but keeping more meta_data on the aggregation related values. >>>>> >>>>> @Anjana WDYT? >>>>> >>>>> Thanks, >>>>> Sachith >>>>> >>>>> >>>>> On Fri, Mar 25, 2016 at 7:07 AM, Inosh Goonewardena <in...@wso2.com> >>>>> wrote: >>>>> >>>>>> Hi Sachith, >>>>>> >>>>>> >>>>>> *WindowOffset*: >>>>>>> >>>>>>> Events might arrive late that would belong to a previous processed >>>>>>> time window. To account to that, we have added an optional parameter >>>>>>> that >>>>>>> would allow to >>>>>>> process immediately previous time windows as well ( acts like a >>>>>>> buffer). >>>>>>> ex: If this is set to 1, apart from the to-be-processed data, data >>>>>>> related to the previously processed time window will also be taken for >>>>>>> processing. >>>>>>> >>>>>> >>>>>> >>>>>> This means, if window offset is set, already processed data will be >>>>>> processed again. How does the aggregate functions works in this case? >>>>>> Actually, what is the plan on supporting aggregate functions? Let's take >>>>>> average function as an example. Are we going to persist sum and count >>>>>> values per time windows and re-calculate whole average based on values of >>>>>> all time windows? Is so, I would guess we can update previously processed >>>>>> time windows sum and count values. >>>>>> >>>>>> >>>>>> On Thu, Mar 24, 2016 at 3:50 AM, Sachith Withana <sach...@wso2.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Srinath, >>>>>>> >>>>>>> Please find the comments inline. >>>>>>> >>>>>>> On Thu, Mar 24, 2016 at 11:39 AM, Srinath Perera <srin...@wso2.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Sachith, Anjana, >>>>>>>> >>>>>>>> +1 for the backend model. >>>>>>>> >>>>>>>> Are we handling the case when last run was done, 25 minutes of data >>>>>>>> is processed. Basically, next run has to re-run last hour and update >>>>>>>> the >>>>>>>> value. >>>>>>>> >>>>>>> >>>>>>> Yes. It will recalculate for that hour and will update the value. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> When does one hour counting starts? is it from the moment server >>>>>>>> starts? That will be probabilistic when you restart. I think we need to >>>>>>>> either start with know place ( midnight) or let user configure it. >>>>>>>> >>>>>>> >>>>>>> In the first run all the data available are processed. >>>>>>> After that it calculates the floor of last processed events' >>>>>>> timestamp and gets the floor value (timestamp - timestamp%3600), that >>>>>>> would >>>>>>> be used as the start of the time windows. >>>>>>> >>>>>>>> >>>>>>>> I am bit concern about the syntax though. This only works for very >>>>>>>> specific type of queries ( that includes aggregate and a group by). >>>>>>>> What >>>>>>>> happen if user do this with a different query? Can we give clear error >>>>>>>> message? >>>>>>>> >>>>>>> >>>>>>> Currently the error messages are very generic. We will have to work >>>>>>> on it to improve those messages. >>>>>>> >>>>>>> Thanks, >>>>>>> Sachith >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> --Srinath >>>>>>>> >>>>>>>> On Mon, Mar 21, 2016 at 5:15 PM, Sachith Withana <sach...@wso2.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> We are adding incremental processing capability to Spark in DAS. >>>>>>>>> As the first stage, we added time slicing to Spark execution. >>>>>>>>> >>>>>>>>> Here's a quick introduction into that. >>>>>>>>> >>>>>>>>> *Execution*: >>>>>>>>> >>>>>>>>> In the first run of the script, it will process all the data in >>>>>>>>> the given table and store the last processed event timestamp. >>>>>>>>> Then from the next run onwards it would start processing starting >>>>>>>>> from that stored timestamp. >>>>>>>>> >>>>>>>>> Until the query that contains the data processing part, completes, >>>>>>>>> last processed event timestamp would not be overridden with the new >>>>>>>>> value. >>>>>>>>> This is to ensure that the data processing for the query wouldn't >>>>>>>>> have to be done again, if the whole query fails. >>>>>>>>> This is ensured by adding a commit query after the main query. >>>>>>>>> Refer to the Syntax section for the example. >>>>>>>>> >>>>>>>>> *Syntax*: >>>>>>>>> >>>>>>>>> In the spark script, incremental processing support has to be >>>>>>>>> specified per table, this would happen in the create temporary table >>>>>>>>> line. >>>>>>>>> >>>>>>>>> ex: CREATE TEMPORARY TABLE T1 USING CarbonAnalytics options >>>>>>>>> (tableName "test", >>>>>>>>> *incrementalProcessing "T1,3600");* >>>>>>>>> >>>>>>>>> INSERT INTO T2 SELECT username, age GROUP BY age FROM T1; >>>>>>>>> >>>>>>>>> INC_TABLE_COMMIT T1; >>>>>>>>> >>>>>>>>> The last line is where it ensures the processing took place >>>>>>>>> successfully and replaces the lastProcessed timestamp with the new >>>>>>>>> one. >>>>>>>>> >>>>>>>>> *TimeWindow*: >>>>>>>>> >>>>>>>>> To do the incremental processing, the user has to provide the time >>>>>>>>> window per which the data would be processed. >>>>>>>>> In the above example. the data would be summarized in *1 hour *time >>>>>>>>> windows. >>>>>>>>> >>>>>>>>> *WindowOffset*: >>>>>>>>> >>>>>>>>> Events might arrive late that would belong to a previous processed >>>>>>>>> time window. To account to that, we have added an optional parameter >>>>>>>>> that >>>>>>>>> would allow to >>>>>>>>> process immediately previous time windows as well ( acts like a >>>>>>>>> buffer). >>>>>>>>> ex: If this is set to 1, apart from the to-be-processed data, data >>>>>>>>> related to the previously processed time window will also be taken for >>>>>>>>> processing. >>>>>>>>> >>>>>>>>> >>>>>>>>> *Limitations*: >>>>>>>>> >>>>>>>>> Currently, multiple time windows cannot be specified per temporary >>>>>>>>> table in the same script. >>>>>>>>> It would have to be done using different temporary tables. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> *Future Improvements:* >>>>>>>>> - Add aggregation function support for incremental processing >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Sachith >>>>>>>>> -- >>>>>>>>> Sachith Withana >>>>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com >>>>>>>>> E-mail: sachith AT wso2.com >>>>>>>>> M: +94715518127 >>>>>>>>> Linked-In: <http://goog_416592669> >>>>>>>>> https://lk.linkedin.com/in/sachithwithana >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ============================ >>>>>>>> Srinath Perera, Ph.D. >>>>>>>> http://people.apache.org/~hemapani/ >>>>>>>> http://srinathsview.blogspot.com/ >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Sachith Withana >>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com >>>>>>> E-mail: sachith AT wso2.com >>>>>>> M: +94715518127 >>>>>>> Linked-In: <http://goog_416592669> >>>>>>> https://lk.linkedin.com/in/sachithwithana >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Architecture mailing list >>>>>>> Architecture@wso2.org >>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks & Regards, >>>>>> >>>>>> Inosh Goonewardena >>>>>> Associate Technical Lead- WSO2 Inc. >>>>>> Mobile: +94779966317 >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Sachith Withana >>>>> Software Engineer; WSO2 Inc.; http://wso2.com >>>>> E-mail: sachith AT wso2.com >>>>> M: +94715518127 >>>>> Linked-In: <http://goog_416592669> >>>>> https://lk.linkedin.com/in/sachithwithana >>>>> >>>> >>>> >>>> >>>> -- >>>> Thanks & Regards, >>>> >>>> Inosh Goonewardena >>>> Associate Technical Lead- WSO2 Inc. >>>> Mobile: +94779966317 >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> Architecture@wso2.org >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> >>> -- >>> W.G. Gihan Anuruddha >>> Senior Software Engineer | WSO2, Inc. >>> M: +94772272595 >>> >>> _______________________________________________ >>> Architecture mailing list >>> Architecture@wso2.org >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> Thanks & Regards, >> >> Inosh Goonewardena >> Associate Technical Lead- WSO2 Inc. >> Mobile: +94779966317 >> > > > > -- > *V. Mohanadarshan* > *Associate Tech Lead,* > *Data Technologies Team,* > *WSO2, Inc. http://wso2.com <http://wso2.com> * > *lean.enterprise.middleware.* > > email: mo...@wso2.com > phone:(+94) 771117673 > > _______________________________________________ > Architecture mailing list > Architecture@wso2.org > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- ============================ Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://home.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902
_______________________________________________ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture