No, the existing regression function should remain, as its a different
usecase. There are many instances where we need to perform regression on a
set of events that are not limited by a time duration. In that case, the
existing regression implementation will be used. However, the duration
parameter should be available for outlier and forecast extensions as well.
So a user should be able to use the outlier/forecast regression functions
with/without duration parameter. So your new version should be applicable
to all 3 regression extensions (linear regression, outlier, forecast).

seshi

On Tue, Jun 7, 2016 at 6:32 PM, Charini Nanayakkara <[email protected]>
wrote:

> Hi Seshika, Suho,
>
> Is the existent regression function to be entirely replaced by the new
> one? If so it's necessary to change implementation of outlier and forecast
> extensions as well, since those are based on the regression implementation.
> Furthermore, there's the concern of existent applications being rendered
> useless if the old version is entirely removed. If it's preferred to keep
> both, a new name is required for this extension.  Since the regression
> function supports both time and length, a name such as regressTimeLength
> would be appropriate IMO. Please give your suggestions.
>
> Regards,
> Charini
>
> On Sun, Jun 5, 2016 at 11:28 AM, Seshika Fernando <[email protected]>
> wrote:
>
>> Hi,
>> The length ceiling is necessary along with the duration parameter. The
>> reason the batch size was originally implemented was to optimize
>> performance when large datasets are considered for regression. We need to
>> be able to give an upper bound. So for example in this case, if user uses a
>> large duration (24 hours)and there are millions of events, then if we put a
>> batch size of 1 million it will consider the last 1 million events in the
>> last 24 hours. Which is a valid use case.
>>
>> For this reason, the ability to specify both duration and batch size is
>> important.
>>
>> Seshi
>> On 2 Jun 2016 14:45, "Charini Nanayakkara" <[email protected]> wrote:
>>
>>> Noted with thanks. Will proceed with the implementation likewise.
>>>
>>> Charini
>>>
>>> On Thu, Jun 2, 2016 at 2:28 PM, Sriskandarajah Suhothayan <[email protected]
>>> > wrote:
>>>
>>>> I think having batchSize & duration will be good as this will limit the
>>>> number of events considered, this can help to improve performance as well.
>>>>
>>>> Suho
>>>>
>>>> On Thu, Jun 2, 2016 at 1:59 PM, Charini Nanayakkara <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Tishan,
>>>>>
>>>>> For my requirement, having time window alone is adequate. So your
>>>>> point might be valid. However I'm concerned of the re-usability of the
>>>>> extension.
>>>>>
>>>>> @Srinath, WDYT? Which would be the better option? Having a single
>>>>> implementation or two different ones?
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Thu, Jun 2, 2016 at 1:48 PM, Tishan Dahanayakage <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Charini,
>>>>>>
>>>>>> My knowledge on the on this domain is sparse. Hence I do not know
>>>>>> whether a scenario where time AND length is a valid business case. If it 
>>>>>> is
>>>>>> a valid business case +1 for the design including both parameters in same
>>>>>> implementation.
>>>>>>
>>>>>> Thanks
>>>>>> /Tishan
>>>>>>
>>>>>> On Thu, Jun 2, 2016 at 12:54 PM, Charini Nanayakkara <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Tishan,
>>>>>>>
>>>>>>> Yes. Assuming batch size is 5 and time window is 20 mins, only 5 out
>>>>>>> of 10 events which arrive within last 5 mins would be processed due to
>>>>>>> batch size constraint (even though all events must be processed if time
>>>>>>> alone was considered). Having separate implementations would work on the
>>>>>>> majority of the scenarios, since only time OR length is usually 
>>>>>>> applicable
>>>>>>> but not both. However, having two implementations would cause trouble in
>>>>>>> the situations where both the time factor and length are important
>>>>>>> (equivalent to AND operation on the two constraints). If our 
>>>>>>> requirement is
>>>>>>> to have only one of the two constraints, we can use a very large value 
>>>>>>> for
>>>>>>> the other parameter (i.e. if we only need to limit number of events 
>>>>>>> based
>>>>>>> on time = 1 sec constraint, we can specify 1,000,000 for batch size
>>>>>>> assuming we have prior knowledge that 1,000,000 events would never 
>>>>>>> arrive
>>>>>>> within 1 sec). IMHO neither of the two options (separate or single
>>>>>>> implementation) are perfect for every scenario. However having a single
>>>>>>> implementation would help address more cases as I understand. What's 
>>>>>>> your
>>>>>>> opinion on this?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Thu, Jun 2, 2016 at 10:14 AM, Charini Nanayakkara <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I have planned to extend the existent Regression Function by adding
>>>>>>>> time parameter. Regression is a functionality available for the Siddhi
>>>>>>>> stream processor extension known as timeseries. In the current
>>>>>>>> implementation, the regression function consumes two or more 
>>>>>>>> parameters and
>>>>>>>> performs regression as follows.
>>>>>>>>
>>>>>>>> The mandatory parameters to be given are the dependent attribute Y
>>>>>>>> and the independent attribute(s) X1, X2,....Xn. For performing simple
>>>>>>>> linear regression, merely one independent attribute would be given. 
>>>>>>>> Two or
>>>>>>>> more independent attributes are consumed for executing multiple linear
>>>>>>>> regression.
>>>>>>>>
>>>>>>>> timeseries:regress(Y, X1, X2......,Xn)
>>>>>>>>
>>>>>>>> The other three optional parameters to be specified are calculation
>>>>>>>> interval, batch size and confidence interval (ci). In the case where 
>>>>>>>> those
>>>>>>>> are not specified, the default values would be assumed.
>>>>>>>>
>>>>>>>> timeseries:regress(calcInterval, batchSize, ci, Y, X1, X2......,Xn)
>>>>>>>>
>>>>>>>> Batch size works as a length window in this implementation, which
>>>>>>>> allows one to restrict the number of events considered when executing
>>>>>>>> regression in real time. For example, if length is 5, only the latest 5
>>>>>>>> events (current event and the 4 events prior to it) would be used for
>>>>>>>> performing regression.
>>>>>>>>
>>>>>>>> *This suggested extension would allow the user to restrict the
>>>>>>>> number of events based on a time window as well, apart from 
>>>>>>>> constraining
>>>>>>>> based on length only. Therefore regression function would consume 
>>>>>>>> duration
>>>>>>>> as an additional parameter, subsequent to the completion of my task. *
>>>>>>>>
>>>>>>>> *timeseries:regress(calcInterval, duration, batchSize, ci, Y, X1,
>>>>>>>> X2......,Xn).*
>>>>>>>>
>>>>>>>> Here the parameter 'duration' would comprise of two parts, where
>>>>>>>> the first part specifies the number and the second part specifies the 
>>>>>>>> unit
>>>>>>>> (e.g. 2 sec, 5 mins, 7 days). On arrival of each event, the past 
>>>>>>>> events to
>>>>>>>> be considered for performing regression would be based on this 
>>>>>>>> 'duration'
>>>>>>>> (i.e. If a new event arrives at 10.00 a.m and the duration is 5  mins, 
>>>>>>>> only
>>>>>>>> the events which arrived within the time period of 9.55 a.m to 10.00 
>>>>>>>> a.m
>>>>>>>> are considered for regression).
>>>>>>>>
>>>>>>>> Suggestions and comments are most welcome.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Charini Vimansha Nanayakkara
>>>>>>>> Software Engineer at WSO2
>>>>>>>> Mobile: 0714126293
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Charini Vimansha Nanayakkara
>>>>>>> Software Engineer at WSO2
>>>>>>> Mobile: 0714126293
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Tishan Dahanayakage
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> Mobile:+94 716481328
>>>>>>
>>>>>> Disclaimer: This communication may contain privileged or other
>>>>>> confidential information and is intended exclusively for the addressee/s.
>>>>>> If you are not the intended recipient/s, or believe that you may have
>>>>>> received this communication in error, please reply to the sender 
>>>>>> indicating
>>>>>> that fact and delete the copy you received and in addition, you should 
>>>>>> not
>>>>>> print, copy, re-transmit, disseminate, or otherwise use the information
>>>>>> contained in this communication. Internet communications cannot be
>>>>>> guaranteed to be timely, secure, error or virus-free. The sender does not
>>>>>> accept liability for any errors or omissions.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Charini Vimansha Nanayakkara
>>>>> Software Engineer at WSO2
>>>>> Mobile: 0714126293
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *S. Suhothayan*
>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor
>>>> *WSO2 Inc. *http://wso2.com
>>>> * <http://wso2.com/>*
>>>> lean . enterprise . middleware
>>>>
>>>>
>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
>>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter:
>>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in:
>>>> http://lk.linkedin.com/in/suhothayan 
>>>> <http://lk.linkedin.com/in/suhothayan>*
>>>>
>>>
>>>
>>>
>>> --
>>> Charini Vimansha Nanayakkara
>>> Software Engineer at WSO2
>>> Mobile: 0714126293
>>>
>>>
>
>
> --
> Charini Vimansha Nanayakkara
> Software Engineer at WSO2
> Mobile: 0714126293
>
>
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to