Hi,
The length ceiling is necessary along with the duration parameter. The
reason the batch size was originally implemented was to optimize
performance when large datasets are considered for regression. We need to
be able to give an upper bound. So for example in this case, if user uses a
large duration (24 hours)and there are millions of events, then if we put a
batch size of 1 million it will consider the last 1 million events in the
last 24 hours. Which is a valid use case.

For this reason, the ability to specify both duration and batch size is
important.

Seshi
On 2 Jun 2016 14:45, "Charini Nanayakkara" <chari...@wso2.com> wrote:

> Noted with thanks. Will proceed with the implementation likewise.
>
> Charini
>
> On Thu, Jun 2, 2016 at 2:28 PM, Sriskandarajah Suhothayan <s...@wso2.com>
> wrote:
>
>> I think having batchSize & duration will be good as this will limit the
>> number of events considered, this can help to improve performance as well.
>>
>> Suho
>>
>> On Thu, Jun 2, 2016 at 1:59 PM, Charini Nanayakkara <chari...@wso2.com>
>> wrote:
>>
>>> Hi Tishan,
>>>
>>> For my requirement, having time window alone is adequate. So your point
>>> might be valid. However I'm concerned of the re-usability of the extension.
>>>
>>> @Srinath, WDYT? Which would be the better option? Having a single
>>> implementation or two different ones?
>>>
>>> Thanks
>>>
>>> On Thu, Jun 2, 2016 at 1:48 PM, Tishan Dahanayakage <tis...@wso2.com>
>>> wrote:
>>>
>>>> Charini,
>>>>
>>>> My knowledge on the on this domain is sparse. Hence I do not know
>>>> whether a scenario where time AND length is a valid business case. If it is
>>>> a valid business case +1 for the design including both parameters in same
>>>> implementation.
>>>>
>>>> Thanks
>>>> /Tishan
>>>>
>>>> On Thu, Jun 2, 2016 at 12:54 PM, Charini Nanayakkara <chari...@wso2.com
>>>> > wrote:
>>>>
>>>>> Hi Tishan,
>>>>>
>>>>> Yes. Assuming batch size is 5 and time window is 20 mins, only 5 out
>>>>> of 10 events which arrive within last 5 mins would be processed due to
>>>>> batch size constraint (even though all events must be processed if time
>>>>> alone was considered). Having separate implementations would work on the
>>>>> majority of the scenarios, since only time OR length is usually applicable
>>>>> but not both. However, having two implementations would cause trouble in
>>>>> the situations where both the time factor and length are important
>>>>> (equivalent to AND operation on the two constraints). If our requirement 
>>>>> is
>>>>> to have only one of the two constraints, we can use a very large value for
>>>>> the other parameter (i.e. if we only need to limit number of events based
>>>>> on time = 1 sec constraint, we can specify 1,000,000 for batch size
>>>>> assuming we have prior knowledge that 1,000,000 events would never arrive
>>>>> within 1 sec). IMHO neither of the two options (separate or single
>>>>> implementation) are perfect for every scenario. However having a single
>>>>> implementation would help address more cases as I understand. What's your
>>>>> opinion on this?
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Thu, Jun 2, 2016 at 10:14 AM, Charini Nanayakkara <
>>>>> chari...@wso2.com> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have planned to extend the existent Regression Function by adding
>>>>>> time parameter. Regression is a functionality available for the Siddhi
>>>>>> stream processor extension known as timeseries. In the current
>>>>>> implementation, the regression function consumes two or more parameters 
>>>>>> and
>>>>>> performs regression as follows.
>>>>>>
>>>>>> The mandatory parameters to be given are the dependent attribute Y
>>>>>> and the independent attribute(s) X1, X2,....Xn. For performing simple
>>>>>> linear regression, merely one independent attribute would be given. Two 
>>>>>> or
>>>>>> more independent attributes are consumed for executing multiple linear
>>>>>> regression.
>>>>>>
>>>>>> timeseries:regress(Y, X1, X2......,Xn)
>>>>>>
>>>>>> The other three optional parameters to be specified are calculation
>>>>>> interval, batch size and confidence interval (ci). In the case where 
>>>>>> those
>>>>>> are not specified, the default values would be assumed.
>>>>>>
>>>>>> timeseries:regress(calcInterval, batchSize, ci, Y, X1, X2......,Xn)
>>>>>>
>>>>>> Batch size works as a length window in this implementation, which
>>>>>> allows one to restrict the number of events considered when executing
>>>>>> regression in real time. For example, if length is 5, only the latest 5
>>>>>> events (current event and the 4 events prior to it) would be used for
>>>>>> performing regression.
>>>>>>
>>>>>> *This suggested extension would allow the user to restrict the number
>>>>>> of events based on a time window as well, apart from constraining based 
>>>>>> on
>>>>>> length only. Therefore regression function would consume duration as an
>>>>>> additional parameter, subsequent to the completion of my task. *
>>>>>>
>>>>>> *timeseries:regress(calcInterval, duration, batchSize, ci, Y, X1,
>>>>>> X2......,Xn).*
>>>>>>
>>>>>> Here the parameter 'duration' would comprise of two parts, where the
>>>>>> first part specifies the number and the second part specifies the unit
>>>>>> (e.g. 2 sec, 5 mins, 7 days). On arrival of each event, the past events 
>>>>>> to
>>>>>> be considered for performing regression would be based on this 'duration'
>>>>>> (i.e. If a new event arrives at 10.00 a.m and the duration is 5  mins, 
>>>>>> only
>>>>>> the events which arrived within the time period of 9.55 a.m to 10.00 a.m
>>>>>> are considered for regression).
>>>>>>
>>>>>> Suggestions and comments are most welcome.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> --
>>>>>> Charini Vimansha Nanayakkara
>>>>>> Software Engineer at WSO2
>>>>>> Mobile: 0714126293
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Charini Vimansha Nanayakkara
>>>>> Software Engineer at WSO2
>>>>> Mobile: 0714126293
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Tishan Dahanayakage
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> Mobile:+94 716481328
>>>>
>>>> Disclaimer: This communication may contain privileged or other
>>>> confidential information and is intended exclusively for the addressee/s.
>>>> If you are not the intended recipient/s, or believe that you may have
>>>> received this communication in error, please reply to the sender indicating
>>>> that fact and delete the copy you received and in addition, you should not
>>>> print, copy, re-transmit, disseminate, or otherwise use the information
>>>> contained in this communication. Internet communications cannot be
>>>> guaranteed to be timely, secure, error or virus-free. The sender does not
>>>> accept liability for any errors or omissions.
>>>>
>>>
>>>
>>>
>>> --
>>> Charini Vimansha Nanayakkara
>>> Software Engineer at WSO2
>>> Mobile: 0714126293
>>>
>>>
>>
>>
>> --
>>
>> *S. Suhothayan*
>> Technical Lead & Team Lead of WSO2 Complex Event Processor
>> *WSO2 Inc. *http://wso2.com
>> * <http://wso2.com/>*
>> lean . enterprise . middleware
>>
>>
>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter:
>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in:
>> http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>*
>>
>
>
>
> --
> Charini Vimansha Nanayakkara
> Software Engineer at WSO2
> Mobile: 0714126293
>
>
_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to