No, the existing regression function should remain, as its a different usecase. There are many instances where we need to perform regression on a set of events that are not limited by a time duration. In that case, the existing regression implementation will be used. However, the duration parameter should be available for outlier and forecast extensions as well. So a user should be able to use the outlier/forecast regression functions with/without duration parameter. So your new version should be applicable to all 3 regression extensions (linear regression, outlier, forecast).
seshi On Tue, Jun 7, 2016 at 6:32 PM, Charini Nanayakkara <[email protected]> wrote: > Hi Seshika, Suho, > > Is the existent regression function to be entirely replaced by the new > one? If so it's necessary to change implementation of outlier and forecast > extensions as well, since those are based on the regression implementation. > Furthermore, there's the concern of existent applications being rendered > useless if the old version is entirely removed. If it's preferred to keep > both, a new name is required for this extension. Since the regression > function supports both time and length, a name such as regressTimeLength > would be appropriate IMO. Please give your suggestions. > > Regards, > Charini > > On Sun, Jun 5, 2016 at 11:28 AM, Seshika Fernando <[email protected]> > wrote: > >> Hi, >> The length ceiling is necessary along with the duration parameter. The >> reason the batch size was originally implemented was to optimize >> performance when large datasets are considered for regression. We need to >> be able to give an upper bound. So for example in this case, if user uses a >> large duration (24 hours)and there are millions of events, then if we put a >> batch size of 1 million it will consider the last 1 million events in the >> last 24 hours. Which is a valid use case. >> >> For this reason, the ability to specify both duration and batch size is >> important. >> >> Seshi >> On 2 Jun 2016 14:45, "Charini Nanayakkara" <[email protected]> wrote: >> >>> Noted with thanks. Will proceed with the implementation likewise. >>> >>> Charini >>> >>> On Thu, Jun 2, 2016 at 2:28 PM, Sriskandarajah Suhothayan <[email protected] >>> > wrote: >>> >>>> I think having batchSize & duration will be good as this will limit the >>>> number of events considered, this can help to improve performance as well. >>>> >>>> Suho >>>> >>>> On Thu, Jun 2, 2016 at 1:59 PM, Charini Nanayakkara <[email protected]> >>>> wrote: >>>> >>>>> Hi Tishan, >>>>> >>>>> For my requirement, having time window alone is adequate. So your >>>>> point might be valid. However I'm concerned of the re-usability of the >>>>> extension. >>>>> >>>>> @Srinath, WDYT? Which would be the better option? Having a single >>>>> implementation or two different ones? >>>>> >>>>> Thanks >>>>> >>>>> On Thu, Jun 2, 2016 at 1:48 PM, Tishan Dahanayakage <[email protected]> >>>>> wrote: >>>>> >>>>>> Charini, >>>>>> >>>>>> My knowledge on the on this domain is sparse. Hence I do not know >>>>>> whether a scenario where time AND length is a valid business case. If it >>>>>> is >>>>>> a valid business case +1 for the design including both parameters in same >>>>>> implementation. >>>>>> >>>>>> Thanks >>>>>> /Tishan >>>>>> >>>>>> On Thu, Jun 2, 2016 at 12:54 PM, Charini Nanayakkara < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Tishan, >>>>>>> >>>>>>> Yes. Assuming batch size is 5 and time window is 20 mins, only 5 out >>>>>>> of 10 events which arrive within last 5 mins would be processed due to >>>>>>> batch size constraint (even though all events must be processed if time >>>>>>> alone was considered). Having separate implementations would work on the >>>>>>> majority of the scenarios, since only time OR length is usually >>>>>>> applicable >>>>>>> but not both. However, having two implementations would cause trouble in >>>>>>> the situations where both the time factor and length are important >>>>>>> (equivalent to AND operation on the two constraints). If our >>>>>>> requirement is >>>>>>> to have only one of the two constraints, we can use a very large value >>>>>>> for >>>>>>> the other parameter (i.e. if we only need to limit number of events >>>>>>> based >>>>>>> on time = 1 sec constraint, we can specify 1,000,000 for batch size >>>>>>> assuming we have prior knowledge that 1,000,000 events would never >>>>>>> arrive >>>>>>> within 1 sec). IMHO neither of the two options (separate or single >>>>>>> implementation) are perfect for every scenario. However having a single >>>>>>> implementation would help address more cases as I understand. What's >>>>>>> your >>>>>>> opinion on this? >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> On Thu, Jun 2, 2016 at 10:14 AM, Charini Nanayakkara < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I have planned to extend the existent Regression Function by adding >>>>>>>> time parameter. Regression is a functionality available for the Siddhi >>>>>>>> stream processor extension known as timeseries. In the current >>>>>>>> implementation, the regression function consumes two or more >>>>>>>> parameters and >>>>>>>> performs regression as follows. >>>>>>>> >>>>>>>> The mandatory parameters to be given are the dependent attribute Y >>>>>>>> and the independent attribute(s) X1, X2,....Xn. For performing simple >>>>>>>> linear regression, merely one independent attribute would be given. >>>>>>>> Two or >>>>>>>> more independent attributes are consumed for executing multiple linear >>>>>>>> regression. >>>>>>>> >>>>>>>> timeseries:regress(Y, X1, X2......,Xn) >>>>>>>> >>>>>>>> The other three optional parameters to be specified are calculation >>>>>>>> interval, batch size and confidence interval (ci). In the case where >>>>>>>> those >>>>>>>> are not specified, the default values would be assumed. >>>>>>>> >>>>>>>> timeseries:regress(calcInterval, batchSize, ci, Y, X1, X2......,Xn) >>>>>>>> >>>>>>>> Batch size works as a length window in this implementation, which >>>>>>>> allows one to restrict the number of events considered when executing >>>>>>>> regression in real time. For example, if length is 5, only the latest 5 >>>>>>>> events (current event and the 4 events prior to it) would be used for >>>>>>>> performing regression. >>>>>>>> >>>>>>>> *This suggested extension would allow the user to restrict the >>>>>>>> number of events based on a time window as well, apart from >>>>>>>> constraining >>>>>>>> based on length only. Therefore regression function would consume >>>>>>>> duration >>>>>>>> as an additional parameter, subsequent to the completion of my task. * >>>>>>>> >>>>>>>> *timeseries:regress(calcInterval, duration, batchSize, ci, Y, X1, >>>>>>>> X2......,Xn).* >>>>>>>> >>>>>>>> Here the parameter 'duration' would comprise of two parts, where >>>>>>>> the first part specifies the number and the second part specifies the >>>>>>>> unit >>>>>>>> (e.g. 2 sec, 5 mins, 7 days). On arrival of each event, the past >>>>>>>> events to >>>>>>>> be considered for performing regression would be based on this >>>>>>>> 'duration' >>>>>>>> (i.e. If a new event arrives at 10.00 a.m and the duration is 5 mins, >>>>>>>> only >>>>>>>> the events which arrived within the time period of 9.55 a.m to 10.00 >>>>>>>> a.m >>>>>>>> are considered for regression). >>>>>>>> >>>>>>>> Suggestions and comments are most welcome. >>>>>>>> >>>>>>>> Thank you. >>>>>>>> >>>>>>>> -- >>>>>>>> Charini Vimansha Nanayakkara >>>>>>>> Software Engineer at WSO2 >>>>>>>> Mobile: 0714126293 >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Charini Vimansha Nanayakkara >>>>>>> Software Engineer at WSO2 >>>>>>> Mobile: 0714126293 >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Tishan Dahanayakage >>>>>> Software Engineer >>>>>> WSO2, Inc. >>>>>> Mobile:+94 716481328 >>>>>> >>>>>> Disclaimer: This communication may contain privileged or other >>>>>> confidential information and is intended exclusively for the addressee/s. >>>>>> If you are not the intended recipient/s, or believe that you may have >>>>>> received this communication in error, please reply to the sender >>>>>> indicating >>>>>> that fact and delete the copy you received and in addition, you should >>>>>> not >>>>>> print, copy, re-transmit, disseminate, or otherwise use the information >>>>>> contained in this communication. Internet communications cannot be >>>>>> guaranteed to be timely, secure, error or virus-free. The sender does not >>>>>> accept liability for any errors or omissions. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Charini Vimansha Nanayakkara >>>>> Software Engineer at WSO2 >>>>> Mobile: 0714126293 >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> *S. Suhothayan* >>>> Technical Lead & Team Lead of WSO2 Complex Event Processor >>>> *WSO2 Inc. *http://wso2.com >>>> * <http://wso2.com/>* >>>> lean . enterprise . middleware >>>> >>>> >>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: >>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter: >>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in: >>>> http://lk.linkedin.com/in/suhothayan >>>> <http://lk.linkedin.com/in/suhothayan>* >>>> >>> >>> >>> >>> -- >>> Charini Vimansha Nanayakkara >>> Software Engineer at WSO2 >>> Mobile: 0714126293 >>> >>> > > > -- > Charini Vimansha Nanayakkara > Software Engineer at WSO2 > Mobile: 0714126293 > >
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
