Hi Lahiru, My comments below. On Tue, Nov 11, 2014 at 11:30 PM, Lahiru Sandaruwan <lahi...@wso2.com> wrote:
> Also this is a good read to learn how Netflix use these algorithms for > scaling. > > http://techblog.netflix.com/search/label/prediction > > On Tue, Nov 11, 2014 at 11:07 PM, Lahiru Sandaruwan <lahi...@wso2.com> > wrote: > >> Hi Seshika, >> >> Thanks for the detailed response, >> >> On Tue, Nov 11, 2014 at 10:08 PM, Seshika Fernando <sesh...@wso2.com> >> wrote: >> >>> Hi all, >>> >>> I have 2 comments. >>> >>> a. The timeseries extension to CEP which supports uni-variate and >>> multi-variate linear regression [1] can be used for this. We can use the >>> multi-variate regression to solve the curve fitting stated in Lahiru's >>> email. Basically what we need to do is use *t *and *t^2* as x1 and x2. >>> There by if we run linear regression we get a,b,c such that V=a+b*t+c*t^2. >>> >> >> Nice to see this exists in CEP already :) >> >> +1 for using multi-variate regression to the curve fitting. Is it >> available in CEP 3.1.0? or can we plug it to 3.1.0? >> > > Its not available in CEP3.1.0, but since its an extension it can be easily >> plugged in. >> >> >>> As Lasantha has mentioned we do have a forecasting facility as well, but >>> currently it only works for uni-variate regression, which is not the case >>> here. But if you really need it I might be able to extend it for this >>> use-case, for the moment. You can still use the existing regression >>> facility to determine the coefficients and do the forecasting yourself >>> (which is just plugging those values in to the above equation, with the >>> relevant t values. >>> Let me also just mention, that even though the function is 'linear' >>> regression, we can use linear regression to fit polynomial curves as long >>> as we know the degree of the polynomial function (which in this case we do). >>> >>> >> In Stratos case, i would like to have the forecasting flexibility in >> Autoscaler side. Because then we can let the user decide prediction time >> duration even in runtime. >> >> BTW. Is it possible to do this in CEP, by redeploying execution plans or >> so? >> > Well currently we have a forecast function for univariate regression. Where you can provide the y stream and the x stream and a value for the x value you want to predict for. So for example, timeseries:forecast(x+5, y, x) - will use y and x streams to compute the regression equation and then provide the forecast y value for x+5, so you can provide the prediction time here. However, the current implementation only caters to 1 independent variable scenario. So if you want to do this on CEP side you can simply get the coefficients using the timeseries:regress() function and then write a further siddhi query, where you get the v = at + bt^2 + c value for any incoming t value. > >> b. Can't we also consider using exponentially weighted moving averages >>> for the previous approach. So instead of using average gradient and average >>> second derivative we can use 'decaying windows' in CEP and get the >>> exponentially weighted moving average of the gradient and second >>> derivative. This will eliminate the spawning of new instances due to sudden >>> 'spikes' as we can control the decaying factor such that we give a >>> practically acceptable weightage to the most recent events compared to >>> older events. >>> >>> >> A great thought. Exponentially weighted moving average filter would be a >> good addition for spike avoidance. >> >> Thanks. >> >> >>> Seshika >>> >>> 1. https://docs.wso2.com/display/CEP400/Regression >>> >>> On Tue, Nov 11, 2014 at 8:51 PM, Lasantha Fernando < >>> lasantha....@gmail.com> wrote: >>> >>>> Hi Lahiru, >>>> >>>> Would it be possible to use linear regression already available as >>>> Siddhi extensions in [1] or maybe improve on that existing extensions >>>> to extend it to fit polynomial curves? The code is available here [2]. >>>> >>>> I think forecasting is also available which can be useful in this >>>> usecase. WDYT? Just sharing my 2 cents.. :-) >>>> >>>> [1] >>>> http://mail.wso2.org/mailarchive/architecture/2014-March/015696.html >>>> [2] >>>> https://github.com/wso2-dev/siddhi/tree/master/modules/siddhi-extensions >>>> >>>> Thanks, >>>> Lasantha >>>> >>>> On Tue, Nov 11, 2014 at 3:58 PM, Lahiru Sandaruwan <lahi...@wso2.com> >>>> wrote: >>>> > Hi all, >>>> > >>>> > This contains the content i already sent to Stratos dev. Idea is to >>>> > highlight and separate the new improvement. >>>> > >>>> > Current implementation >>>> > >>>> > Currently CEP calculates average, gradient, and second derivative and >>>> send >>>> > those values to Autoscaler. Then Autoscaler predicts the values using >>>> S = >>>> > u*t + 0.5*a*t*t. >>>> > >>>> > In this method CEP calculation is not very much accurate as it does >>>> not >>>> > consider all the events when calculating the gradient and second >>>> derivative. >>>> > Therefore the equation we apply doesn't yield the best prediction. >>>> > >>>> > Proposed Implementation >>>> > >>>> > CEP's task >>>> > >>>> > I think best approach is to do "curve fitting"[1] for received event >>>> sample >>>> > in a particular time window. Refer "Locally weighted linear >>>> regression" >>>> > section at [2] for more details. >>>> > >>>> > We would need a second degree polynomial fitter for this, where we >>>> can use >>>> > Apache commons math library for this. Refer the sample at [3], we can >>>> run >>>> > this with any degree. e.g. 2, 3. Just increase the degree to increase >>>> the >>>> > accuracy. >>>> > >>>> > E.g. >>>> > So if get degree 2 polynomial fitter, we will have an equation like >>>> below >>>> > where value(v) is our statistic value and time(t) is the time of >>>> event. >>>> > >>>> > Equation we get from received events, >>>> > v = a*t*t + b*t + c >>>> > >>>> > So the solution is, >>>> > >>>> > Find memberwise curves that fits events received in specific >>>> window(say 10 >>>> > minutes) at CEP >>>> > Send the parameters of fitted line(a, b, and c in above equation) >>>> with the >>>> > timestamp of last event(T) in the window, to Autoscaler >>>> > >>>> > Autoscaler's task >>>> > >>>> > Autoscaler use v = a*t*t + b*t + c function to predict the value in >>>> any >>>> > timestamp from the last timestamp >>>> > >>>> > E.g. Say we need to find the value(v) after 1 minute(assuming we >>>> carried all >>>> > the calculations in milliseconds), >>>> > >>>> > v = a * (T+60000) * (T+60000) + b * (T+60000) + c >>>> > >>>> > So we have memberwise predictions and we can find clusterwise >>>> prediction by >>>> > averaging all the memberwise values. >>>> > >>>> > >>>> > Please send your thoughts. >>>> > >>>> > Thanks. >>>> > >>>> > [1] http://en.wikipedia.org/wiki/Curve_fitting >>>> > [2] http://cs229.stanford.edu/notes/cs229-notes1.pdf >>>> > [3] >>>> http://commons.apache.org/proper/commons-math/userguide/fitting.html >>>> > >>>> > >>>> > -- >>>> > -- >>>> > Lahiru Sandaruwan >>>> > Committer and PMC member, Apache Stratos, >>>> > Senior Software Engineer, >>>> > WSO2 Inc., http://wso2.com >>>> > lean.enterprise.middleware >>>> > >>>> > email: lahi...@wso2.com blog: http://lahiruwrites.blogspot.com/ >>>> > linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146 >>>> > >>>> >>> >>> >> >> >> -- >> -- >> Lahiru Sandaruwan >> Committer and PMC member, Apache Stratos, >> Senior Software Engineer, >> WSO2 Inc., http://wso2.com >> lean.enterprise.middleware >> >> email: lahi...@wso2.com blog: http://lahiruwrites.blogspot.com/ >> linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146 >> >> > > > -- > -- > Lahiru Sandaruwan > Committer and PMC member, Apache Stratos, > Senior Software Engineer, > WSO2 Inc., http://wso2.com > lean.enterprise.middleware > > email: lahi...@wso2.com blog: http://lahiruwrites.blogspot.com/ > linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146 > >