Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-04-03 Thread Nirmal Fernando
Hi Mahesh,

So, we are in the process of evaluating proposals, till then, you can start
doing some project related tasks and update us on what you did. Also feel
free to ask any questions that you may have.

On Thu, Mar 31, 2016 at 2:48 PM, Mahesh Dananjaya <dananjayamah...@gmail.com
> wrote:

> Hi Maheshakya,
> Google have accepted my proof of enrollment. So do i need to proceed
> further with the project?t. I have been working with the Spark MLLib and
> trying to implement those two algorithms. Can you please tell me what is
> the next step i want to do.do i need to wait?thank you.
> regards,
> Mahesh.
>
> On Fri, Mar 25, 2016 at 10:40 PM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi Maheshakya,
>> Thank you very much for the support given during the last couple of
>> weeks.I have finally submitted the proposal to the site.And i am looking
>> forward to contribute to your wso2 ml.thank you.
>> regards,
>> Mahesh.
>>
>> On Fri, Mar 25, 2016 at 7:49 PM, Mahesh Dananjaya <
>> dananjayamah...@gmail.com> wrote:
>>
>>> Hi maheshakya,
>>> i added the timeline according to my knowledge and uploaded.pls
>>> check.thank you.
>>> regards,
>>> Mahesh.
>>>
>>> On Fri, Mar 25, 2016 at 7:09 PM, Maheshakya Wijewardena <
>>> mahesha...@wso2.com> wrote:
>>>
>>>> Hi Mahesh,
>>>>
>>>> Can you add the time line of the project as I've mentioned. It's one of
>>>> the crucial parts of the proposal that allows us to evaluate feasibility of
>>>> the project in accordance with the given time period by Google.
>>>>
>>>> Best regards.
>>>>
>>>> On Fri, Mar 25, 2016 at 6:53 PM, Mahesh Dananjaya <
>>>> dananjayamah...@gmail.com> wrote:
>>>>
>>>>>
>>>>> -- Forwarded message --
>>>>> From: Mahesh Dananjaya <dananjayamah...@gmail.com>
>>>>> Date: Fri, Mar 25, 2016 at 7:02 PM
>>>>> Subject: Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]
>>>>> To: Maheshakya Wijewardena <mahesha...@wso2.com>
>>>>>
>>>>>
>>>>> Hi maheshakya,
>>>>> I have uploaded my final submission.here it is. pls check it and
>>>>> inform me anything i need to change.thank you.
>>>>> BR,
>>>>> Mahesh.
>>>>>
>>>>> On Fri, Mar 25, 2016 at 6:28 PM, Mahesh Dananjaya <
>>>>> dananjayamah...@gmail.com> wrote:
>>>>>
>>>>>> Hi Maheshakya,
>>>>>> thank you very much. I will be updating the proposal with those
>>>>>> changes and i will submit it by now.thank you.
>>>>>> regards,
>>>>>> Mahesh.
>>>>>>
>>>>>> On Fri, Mar 25, 2016 at 6:07 PM, Maheshakya Wijewardena <
>>>>>> mahesha...@wso2.com> wrote:
>>>>>>
>>>>>>> Hi Mahesh,
>>>>>>>
>>>>>>> In the title, please include both tags [ML] and [CEP]
>>>>>>>
>>>>>>> Best regards.
>>>>>>>
>>>>>>> On Fri, Mar 25, 2016 at 5:49 PM, Maheshakya Wijewardena <
>>>>>>> mahesha...@wso2.com> wrote:
>>>>>>>
>>>>>>>> Also, please include an introduction to yourself (University,
>>>>>>>> department), past experience in machine learning, language 
>>>>>>>> proficiency, etc
>>>>>>>> at the beginning of the proposal.
>>>>>>>>
>>>>>>>> Best regards.
>>>>>>>>
>>>>>>>> On Fri, Mar 25, 2016 at 5:47 PM, Maheshakya Wijewardena <
>>>>>>>> mahesha...@wso2.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Mahesh,
>>>>>>>>>
>>>>>>>>> Thank you for sending the draft. Please submit it as soon as
>>>>>>>>> possible.
>>>>>>>>>
>>>>>>>>> Few high level comments:
>>>>>>>>>
>>>>>>>>> In the proposal, you must specifically mention that this will be
>>>>>>>>> implemented as a Siddhi extension that can operate directly on 
>>>>>>>>> incoming
>>>>>>>>> streams.
>>

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-31 Thread Mahesh Dananjaya
Hi Maheshakya,
Google have accepted my proof of enrollment. So do i need to proceed
further with the project?t. I have been working with the Spark MLLib and
trying to implement those two algorithms. Can you please tell me what is
the next step i want to do.do i need to wait?thank you.
regards,
Mahesh.

On Fri, Mar 25, 2016 at 10:40 PM, Mahesh Dananjaya <
dananjayamah...@gmail.com> wrote:

> Hi Maheshakya,
> Thank you very much for the support given during the last couple of
> weeks.I have finally submitted the proposal to the site.And i am looking
> forward to contribute to your wso2 ml.thank you.
> regards,
> Mahesh.
>
> On Fri, Mar 25, 2016 at 7:49 PM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi maheshakya,
>> i added the timeline according to my knowledge and uploaded.pls
>> check.thank you.
>> regards,
>> Mahesh.
>>
>> On Fri, Mar 25, 2016 at 7:09 PM, Maheshakya Wijewardena <
>> mahesha...@wso2.com> wrote:
>>
>>> Hi Mahesh,
>>>
>>> Can you add the time line of the project as I've mentioned. It's one of
>>> the crucial parts of the proposal that allows us to evaluate feasibility of
>>> the project in accordance with the given time period by Google.
>>>
>>> Best regards.
>>>
>>> On Fri, Mar 25, 2016 at 6:53 PM, Mahesh Dananjaya <
>>> dananjayamah...@gmail.com> wrote:
>>>
>>>>
>>>> -- Forwarded message --
>>>> From: Mahesh Dananjaya <dananjayamah...@gmail.com>
>>>> Date: Fri, Mar 25, 2016 at 7:02 PM
>>>> Subject: Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]
>>>> To: Maheshakya Wijewardena <mahesha...@wso2.com>
>>>>
>>>>
>>>> Hi maheshakya,
>>>> I have uploaded my final submission.here it is. pls check it and inform
>>>> me anything i need to change.thank you.
>>>> BR,
>>>> Mahesh.
>>>>
>>>> On Fri, Mar 25, 2016 at 6:28 PM, Mahesh Dananjaya <
>>>> dananjayamah...@gmail.com> wrote:
>>>>
>>>>> Hi Maheshakya,
>>>>> thank you very much. I will be updating the proposal with those
>>>>> changes and i will submit it by now.thank you.
>>>>> regards,
>>>>> Mahesh.
>>>>>
>>>>> On Fri, Mar 25, 2016 at 6:07 PM, Maheshakya Wijewardena <
>>>>> mahesha...@wso2.com> wrote:
>>>>>
>>>>>> Hi Mahesh,
>>>>>>
>>>>>> In the title, please include both tags [ML] and [CEP]
>>>>>>
>>>>>> Best regards.
>>>>>>
>>>>>> On Fri, Mar 25, 2016 at 5:49 PM, Maheshakya Wijewardena <
>>>>>> mahesha...@wso2.com> wrote:
>>>>>>
>>>>>>> Also, please include an introduction to yourself (University,
>>>>>>> department), past experience in machine learning, language proficiency, 
>>>>>>> etc
>>>>>>> at the beginning of the proposal.
>>>>>>>
>>>>>>> Best regards.
>>>>>>>
>>>>>>> On Fri, Mar 25, 2016 at 5:47 PM, Maheshakya Wijewardena <
>>>>>>> mahesha...@wso2.com> wrote:
>>>>>>>
>>>>>>>> Hi Mahesh,
>>>>>>>>
>>>>>>>> Thank you for sending the draft. Please submit it as soon as
>>>>>>>> possible.
>>>>>>>>
>>>>>>>> Few high level comments:
>>>>>>>>
>>>>>>>> In the proposal, you must specifically mention that this will be
>>>>>>>> implemented as a Siddhi extension that can operate directly on incoming
>>>>>>>> streams.
>>>>>>>>
>>>>>>>> Also, you need to have a time line for the project, A sample looks
>>>>>>>> like:
>>>>>>>>
>>>>>>>> May 1- May 20 - Community bonding period - Getting familiar with
>>>>>>>> the platform and discussing implementation methods.
>>>>>>>> May 20 - May 30 - Implementing streaming k-means,
>>>>>>>> -
>>>>>>>> -
>>>>>>>> July 20-24 - Writing examples
>>>>>>>> July 24-18 - Documentation
>>>>>>>>
>>>>>>>> This s

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-25 Thread Maheshakya Wijewardena
Hi Mahesh,

Can you add the time line of the project as I've mentioned. It's one of the
crucial parts of the proposal that allows us to evaluate feasibility of the
project in accordance with the given time period by Google.

Best regards.

On Fri, Mar 25, 2016 at 6:53 PM, Mahesh Dananjaya <dananjayamah...@gmail.com
> wrote:

>
> -- Forwarded message --
> From: Mahesh Dananjaya <dananjayamah...@gmail.com>
> Date: Fri, Mar 25, 2016 at 7:02 PM
> Subject: Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]
> To: Maheshakya Wijewardena <mahesha...@wso2.com>
>
>
> Hi maheshakya,
> I have uploaded my final submission.here it is. pls check it and inform me
> anything i need to change.thank you.
> BR,
> Mahesh.
>
> On Fri, Mar 25, 2016 at 6:28 PM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi Maheshakya,
>> thank you very much. I will be updating the proposal with those changes
>> and i will submit it by now.thank you.
>> regards,
>> Mahesh.
>>
>> On Fri, Mar 25, 2016 at 6:07 PM, Maheshakya Wijewardena <
>> mahesha...@wso2.com> wrote:
>>
>>> Hi Mahesh,
>>>
>>> In the title, please include both tags [ML] and [CEP]
>>>
>>> Best regards.
>>>
>>> On Fri, Mar 25, 2016 at 5:49 PM, Maheshakya Wijewardena <
>>> mahesha...@wso2.com> wrote:
>>>
>>>> Also, please include an introduction to yourself (University,
>>>> department), past experience in machine learning, language proficiency, etc
>>>> at the beginning of the proposal.
>>>>
>>>> Best regards.
>>>>
>>>> On Fri, Mar 25, 2016 at 5:47 PM, Maheshakya Wijewardena <
>>>> mahesha...@wso2.com> wrote:
>>>>
>>>>> Hi Mahesh,
>>>>>
>>>>> Thank you for sending the draft. Please submit it as soon as possible.
>>>>>
>>>>> Few high level comments:
>>>>>
>>>>> In the proposal, you must specifically mention that this will be
>>>>> implemented as a Siddhi extension that can operate directly on incoming
>>>>> streams.
>>>>>
>>>>> Also, you need to have a time line for the project, A sample looks
>>>>> like:
>>>>>
>>>>> May 1- May 20 - Community bonding period - Getting familiar with the
>>>>> platform and discussing implementation methods.
>>>>> May 20 - May 30 - Implementing streaming k-means,
>>>>> -
>>>>> -
>>>>> July 20-24 - Writing examples
>>>>> July 24-18 - Documentation
>>>>>
>>>>> This should end before pencils down date. Refer to the correct time
>>>>> line given in GSoC site.
>>>>>
>>>>> The implementation details of the the streaming algorithms looks fine.
>>>>>
>>>>> Best regards.
>>>>>
>>>>>
>>>>> On Fri, Mar 25, 2016 at 5:23 PM, Mahesh Dananjaya <
>>>>> dananjayamah...@gmail.com> wrote:
>>>>>
>>>>>> Hi Maheshakya,
>>>>>> this is my draft proposal.
>>>>>>
>>>>>> https://docs.google.com/document/d/1apZfEXZXEH5GwSwS7hARINbGw5_zinxWdZjEmyqfKu4/edit?usp=sha
>>>>>> <https://docs.google.com/document/d/1apZfEXZXEH5GwSwS7hARINbGw5_zinxWdZjEmyqfKu4/edit?usp=sharing>
>>>>>> ring
>>>>>> can you ple check this and see whether it is correct.thank you.
>>>>>> BR,
>>>>>> Mahesh
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 21, 2016 at 1:15 PM, Maheshakya Wijewardena <
>>>>>> mahesha...@wso2.com> wrote:
>>>>>>
>>>>>>> Hi Mahesh,
>>>>>>>
>>>>>>> The deadline for submitting your proposals is on March 25th, 2016,
>>>>>>> therefore please start writing the proposal and get feedback.
>>>>>>>
>>>>>>> Best regards.
>>>>>>>
>>>>>>> On Tue, Mar 15, 2016 at 4:14 PM, Mahesh Dananjaya <
>>>>>>> dananjayamah...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Maheshakaya,
>>>>>>>> Ok.I have been trying some examples and try to split them and train
>>>>>>>> incrementally. Still doing that. i have been adding them to my github 
&g

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-25 Thread Maheshakya Wijewardena
Also, please include an introduction to yourself (University, department),
past experience in machine learning, language proficiency, etc at the
beginning of the proposal.

Best regards.

On Fri, Mar 25, 2016 at 5:47 PM, Maheshakya Wijewardena  wrote:

> Hi Mahesh,
>
> Thank you for sending the draft. Please submit it as soon as possible.
>
> Few high level comments:
>
> In the proposal, you must specifically mention that this will be
> implemented as a Siddhi extension that can operate directly on incoming
> streams.
>
> Also, you need to have a time line for the project, A sample looks like:
>
> May 1- May 20 - Community bonding period - Getting familiar with the
> platform and discussing implementation methods.
> May 20 - May 30 - Implementing streaming k-means,
> -
> -
> July 20-24 - Writing examples
> July 24-18 - Documentation
>
> This should end before pencils down date. Refer to the correct time line
> given in GSoC site.
>
> The implementation details of the the streaming algorithms looks fine.
>
> Best regards.
>
>
> On Fri, Mar 25, 2016 at 5:23 PM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi Maheshakya,
>> this is my draft proposal.
>>
>> https://docs.google.com/document/d/1apZfEXZXEH5GwSwS7hARINbGw5_zinxWdZjEmyqfKu4/edit?usp=sha
>> 
>> ring
>> can you ple check this and see whether it is correct.thank you.
>> BR,
>> Mahesh
>>
>>
>> On Mon, Mar 21, 2016 at 1:15 PM, Maheshakya Wijewardena <
>> mahesha...@wso2.com> wrote:
>>
>>> Hi Mahesh,
>>>
>>> The deadline for submitting your proposals is on March 25th, 2016,
>>> therefore please start writing the proposal and get feedback.
>>>
>>> Best regards.
>>>
>>> On Tue, Mar 15, 2016 at 4:14 PM, Mahesh Dananjaya <
>>> dananjayamah...@gmail.com> wrote:
>>>
 Hi Maheshakaya,
 Ok.I have been trying some examples and try to split them and train
 incrementally. Still doing that. i have been adding them to my github repo
 too. https://github.com/dananjayamahesh/GSOC2016 . i saw that there is
 only scala API support for those streaming algorithms in Spark. so my task
 is to develop Java API. will let you nkow my progress.thank you very much.
 BR,
 Mahesh

 On Tue, Mar 15, 2016 at 3:21 PM, Maheshakya Wijewardena <
 mahesha...@wso2.com> wrote:

> Hi Mahesh,
>
> No you don't need to use Hadoop at any stage in this project.
> Everything you need is in Spark (regarding ML algorithms).
> You can also use Spark MLLibs methods to randomly split datasets.
>
> Best regards.
>
> On Mon, Mar 14, 2016 at 1:28 PM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi Maheshakya,
>> I am writing some java programs and try to break the dataset into
>> several pieces and train a model repeatedly with those data sets using
>> Spark MLLib. Do i have to do anything with Hadoop at this stage, because 
>> i
>> am working with a standalone mode.thank you.
>> BR,
>> Mahesh.
>>
>> On Sun, Mar 13, 2016 at 6:30 PM, Maheshakya Wijewardena <
>> mahesha...@wso2.com> wrote:
>>
>>> Hi Mahesh,
>>>
>>> You don't have to look into carbon-ml.
>>>
>>> Best regards.
>>>
>>> On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya <
>>> dananjayamah...@gmail.com> wrote:
>>>
 Hi maheshakya,
 i am working on some examples related to Spark and ML.is there
 anything to do with carbon-ml. I think i dont need to look into that 
 one.do
 i?
 BR,
 Mahesh

 On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena <
 mahesha...@wso2.com> wrote:

> Hi Mahesh,
>
> does that Scala API is with your current product or repo?
>
>
> No, we don't have the Scala API included. What we want is to
> design the Java implementations of those algorithms to train with
> mini-batches of streaming data with the help of the aforementioned 
> methods
> so that we can include in as a CEP extension.
>
> As to clarify, please try to write a simple Java program using
> Spark MLLib linear regression and k-means clustering with a sample 
> data set
> (You can find alot of data sets from UCI repo[1]).  You need to break 
> the
> dataset into several pieces and train a model repeatedly with those.
> After each training run, save the model information (such as
> weights, intercepts for regression and cluster centers for clustering 
> -
> please check the arguments of those methods I have mentioned and save 
> the
> required information of the model)
> When training a model we a new piece of data, use those methods to
> initialize and put 

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-25 Thread Maheshakya Wijewardena
Hi Mahesh,

Thank you for sending the draft. Please submit it as soon as possible.

Few high level comments:

In the proposal, you must specifically mention that this will be
implemented as a Siddhi extension that can operate directly on incoming
streams.

Also, you need to have a time line for the project, A sample looks like:

May 1- May 20 - Community bonding period - Getting familiar with the
platform and discussing implementation methods.
May 20 - May 30 - Implementing streaming k-means,
-
-
July 20-24 - Writing examples
July 24-18 - Documentation

This should end before pencils down date. Refer to the correct time line
given in GSoC site.

The implementation details of the the streaming algorithms looks fine.

Best regards.


On Fri, Mar 25, 2016 at 5:23 PM, Mahesh Dananjaya  wrote:

> Hi Maheshakya,
> this is my draft proposal.
>
> https://docs.google.com/document/d/1apZfEXZXEH5GwSwS7hARINbGw5_zinxWdZjEmyqfKu4/edit?usp=sha
> 
> ring
> can you ple check this and see whether it is correct.thank you.
> BR,
> Mahesh
>
>
> On Mon, Mar 21, 2016 at 1:15 PM, Maheshakya Wijewardena <
> mahesha...@wso2.com> wrote:
>
>> Hi Mahesh,
>>
>> The deadline for submitting your proposals is on March 25th, 2016,
>> therefore please start writing the proposal and get feedback.
>>
>> Best regards.
>>
>> On Tue, Mar 15, 2016 at 4:14 PM, Mahesh Dananjaya <
>> dananjayamah...@gmail.com> wrote:
>>
>>> Hi Maheshakaya,
>>> Ok.I have been trying some examples and try to split them and train
>>> incrementally. Still doing that. i have been adding them to my github repo
>>> too. https://github.com/dananjayamahesh/GSOC2016 . i saw that there is
>>> only scala API support for those streaming algorithms in Spark. so my task
>>> is to develop Java API. will let you nkow my progress.thank you very much.
>>> BR,
>>> Mahesh
>>>
>>> On Tue, Mar 15, 2016 at 3:21 PM, Maheshakya Wijewardena <
>>> mahesha...@wso2.com> wrote:
>>>
 Hi Mahesh,

 No you don't need to use Hadoop at any stage in this project.
 Everything you need is in Spark (regarding ML algorithms).
 You can also use Spark MLLibs methods to randomly split datasets.

 Best regards.

 On Mon, Mar 14, 2016 at 1:28 PM, Mahesh Dananjaya <
 dananjayamah...@gmail.com> wrote:

> Hi Maheshakya,
> I am writing some java programs and try to break the dataset into
> several pieces and train a model repeatedly with those data sets using
> Spark MLLib. Do i have to do anything with Hadoop at this stage, because i
> am working with a standalone mode.thank you.
> BR,
> Mahesh.
>
> On Sun, Mar 13, 2016 at 6:30 PM, Maheshakya Wijewardena <
> mahesha...@wso2.com> wrote:
>
>> Hi Mahesh,
>>
>> You don't have to look into carbon-ml.
>>
>> Best regards.
>>
>> On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya <
>> dananjayamah...@gmail.com> wrote:
>>
>>> Hi maheshakya,
>>> i am working on some examples related to Spark and ML.is there
>>> anything to do with carbon-ml. I think i dont need to look into that 
>>> one.do
>>> i?
>>> BR,
>>> Mahesh
>>>
>>> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena <
>>> mahesha...@wso2.com> wrote:
>>>
 Hi Mahesh,

 does that Scala API is with your current product or repo?


 No, we don't have the Scala API included. What we want is to design
 the Java implementations of those algorithms to train with 
 mini-batches of
 streaming data with the help of the aforementioned methods so that we 
 can
 include in as a CEP extension.

 As to clarify, please try to write a simple Java program using
 Spark MLLib linear regression and k-means clustering with a sample 
 data set
 (You can find alot of data sets from UCI repo[1]).  You need to break 
 the
 dataset into several pieces and train a model repeatedly with those.
 After each training run, save the model information (such as
 weights, intercepts for regression and cluster centers for clustering -
 please check the arguments of those methods I have mentioned and save 
 the
 required information of the model)
 When training a model we a new piece of data, use those methods to
 initialize and put the save values for the arguments. This way you can
 start from where you stopped in the previous run.

 Let us know your observations and feel free to ask if you need to
 know anything more on this.

 We'll let you know what needs to be done to include this in CEP.

 Best regards.

 On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya <
 

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-24 Thread Supun Sethunga
Hi Mahesh,

Please submit your final proposal to GSoC, before the deadline.

Regards,
Supun

On Mon, Mar 21, 2016 at 1:00 PM, Maheshakya Wijewardena  wrote:

> Hi Mahesh,
>
> The deadline for submitting your proposals is on March 25th, 2016,
> therefore please start writing the proposal and get feedback.
>
> Best regards.
>
> On Tue, Mar 15, 2016 at 4:14 PM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi Maheshakaya,
>> Ok.I have been trying some examples and try to split them and train
>> incrementally. Still doing that. i have been adding them to my github repo
>> too. https://github.com/dananjayamahesh/GSOC2016 . i saw that there is
>> only scala API support for those streaming algorithms in Spark. so my task
>> is to develop Java API. will let you nkow my progress.thank you very much.
>> BR,
>> Mahesh
>>
>> On Tue, Mar 15, 2016 at 3:21 PM, Maheshakya Wijewardena <
>> mahesha...@wso2.com> wrote:
>>
>>> Hi Mahesh,
>>>
>>> No you don't need to use Hadoop at any stage in this project. Everything
>>> you need is in Spark (regarding ML algorithms).
>>> You can also use Spark MLLibs methods to randomly split datasets.
>>>
>>> Best regards.
>>>
>>> On Mon, Mar 14, 2016 at 1:28 PM, Mahesh Dananjaya <
>>> dananjayamah...@gmail.com> wrote:
>>>
 Hi Maheshakya,
 I am writing some java programs and try to break the dataset into
 several pieces and train a model repeatedly with those data sets using
 Spark MLLib. Do i have to do anything with Hadoop at this stage, because i
 am working with a standalone mode.thank you.
 BR,
 Mahesh.

 On Sun, Mar 13, 2016 at 6:30 PM, Maheshakya Wijewardena <
 mahesha...@wso2.com> wrote:

> Hi Mahesh,
>
> You don't have to look into carbon-ml.
>
> Best regards.
>
> On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi maheshakya,
>> i am working on some examples related to Spark and ML.is there
>> anything to do with carbon-ml. I think i dont need to look into that 
>> one.do
>> i?
>> BR,
>> Mahesh
>>
>> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena <
>> mahesha...@wso2.com> wrote:
>>
>>> Hi Mahesh,
>>>
>>> does that Scala API is with your current product or repo?
>>>
>>>
>>> No, we don't have the Scala API included. What we want is to design
>>> the Java implementations of those algorithms to train with mini-batches 
>>> of
>>> streaming data with the help of the aforementioned methods so that we 
>>> can
>>> include in as a CEP extension.
>>>
>>> As to clarify, please try to write a simple Java program using Spark
>>> MLLib linear regression and k-means clustering with a sample data set 
>>> (You
>>> can find alot of data sets from UCI repo[1]).  You need to break the
>>> dataset into several pieces and train a model repeatedly with those.
>>> After each training run, save the model information (such as
>>> weights, intercepts for regression and cluster centers for clustering -
>>> please check the arguments of those methods I have mentioned and save 
>>> the
>>> required information of the model)
>>> When training a model we a new piece of data, use those methods to
>>> initialize and put the save values for the arguments. This way you can
>>> start from where you stopped in the previous run.
>>>
>>> Let us know your observations and feel free to ask if you need to
>>> know anything more on this.
>>>
>>> We'll let you know what needs to be done to include this in CEP.
>>>
>>> Best regards.
>>>
>>> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya <
>>> dananjayamah...@gmail.com> wrote:
>>>
 Hi Maheshakya,
 great.thank you.i already have ML and CEP and working more towards
 it. does that Scala API is with your current product or repo?.  thank 
 you.
 BR,
 Mahesh.

 On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena <
 mahesha...@wso2.com> wrote:

> Hi Mahesh,
>
> Please find the comments inline.
>
> does data stream is taken to ML as the event publisher's format
>> through event publisher. Or  we can use direct traffic that comes to 
>> event
>> receiver, or else as streams
>>
> We intend to use the direct data as even streams.
>
> 1.) Those data coming from wso2 DAS to ML are coming as streams?
>>
> No, WSO2 ML doesn't use any even stream. The data stored in tables
> in DAS is loaded into ML.
>
> 2.) Are there any incremental learning algorithms currently active
>> in ML?you mentioned that there are and they are with scala API. So 
>> there is
>> a streaming support with that Scala API. In 

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-15 Thread Maheshakya Wijewardena
Hi Mahesh,

No you don't need to use Hadoop at any stage in this project. Everything
you need is in Spark (regarding ML algorithms).
You can also use Spark MLLibs methods to randomly split datasets.

Best regards.

On Mon, Mar 14, 2016 at 1:28 PM, Mahesh Dananjaya  wrote:

> Hi Maheshakya,
> I am writing some java programs and try to break the dataset into several
> pieces and train a model repeatedly with those data sets using Spark MLLib.
> Do i have to do anything with Hadoop at this stage, because i am working
> with a standalone mode.thank you.
> BR,
> Mahesh.
>
> On Sun, Mar 13, 2016 at 6:30 PM, Maheshakya Wijewardena <
> mahesha...@wso2.com> wrote:
>
>> Hi Mahesh,
>>
>> You don't have to look into carbon-ml.
>>
>> Best regards.
>>
>> On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya <
>> dananjayamah...@gmail.com> wrote:
>>
>>> Hi maheshakya,
>>> i am working on some examples related to Spark and ML.is there anything
>>> to do with carbon-ml. I think i dont need to look into that one.do i?
>>> BR,
>>> Mahesh
>>>
>>> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena <
>>> mahesha...@wso2.com> wrote:
>>>
 Hi Mahesh,

 does that Scala API is with your current product or repo?


 No, we don't have the Scala API included. What we want is to design the
 Java implementations of those algorithms to train with mini-batches of
 streaming data with the help of the aforementioned methods so that we can
 include in as a CEP extension.

 As to clarify, please try to write a simple Java program using Spark
 MLLib linear regression and k-means clustering with a sample data set (You
 can find alot of data sets from UCI repo[1]).  You need to break the
 dataset into several pieces and train a model repeatedly with those.
 After each training run, save the model information (such as weights,
 intercepts for regression and cluster centers for clustering - please check
 the arguments of those methods I have mentioned and save the required
 information of the model)
 When training a model we a new piece of data, use those methods to
 initialize and put the save values for the arguments. This way you can
 start from where you stopped in the previous run.

 Let us know your observations and feel free to ask if you need to know
 anything more on this.

 We'll let you know what needs to be done to include this in CEP.

 Best regards.

 On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya <
 dananjayamah...@gmail.com> wrote:

> Hi Maheshakya,
> great.thank you.i already have ML and CEP and working more towards it.
> does that Scala API is with your current product or repo?.  thank you.
> BR,
> Mahesh.
>
> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena <
> mahesha...@wso2.com> wrote:
>
>> Hi Mahesh,
>>
>> Please find the comments inline.
>>
>> does data stream is taken to ML as the event publisher's format
>>> through event publisher. Or  we can use direct traffic that comes to 
>>> event
>>> receiver, or else as streams
>>>
>> We intend to use the direct data as even streams.
>>
>> 1.) Those data coming from wso2 DAS to ML are coming as streams?
>>>
>> No, WSO2 ML doesn't use any even stream. The data stored in tables in
>> DAS is loaded into ML.
>>
>> 2.) Are there any incremental learning algorithms currently active in
>>> ML?you mentioned that there are and they are with scala API. So there 
>>> is a
>>> streaming support with that Scala API. In that API which format the 
>>> data is
>>> aquired to ML?
>>>
>> No, there are no incremental learning algorithms in ML. The scala API
>> is about Spark MLLib. MLLib supports streaming k-means and other
>> generalized linear models (linear regression variants and logistic
>> regression) with Scala API. What they basically do in those 
>> implementations
>> is retraining the trained models with mini batches when data sequentially
>> arrives. There, the breaking of streaming data into mini batches is done
>> with the help of Spark Streaming. But we do not intend to use Spark
>> streaming in our implementation. What we need to do is implement a 
>> similar
>> behavior for event streams using the Java API.  The Java API has the
>> following methods:
>>
>>- *createModel
>>
>> *
>>(Vector
>>
>> 
>>  weights,
>>double intercept) - for GLMs
>>- *setInitialModel
>>
>> 

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-14 Thread Mahesh Dananjaya
Hi Maheshakya,
I am writing some java programs and try to break the dataset into several
pieces and train a model repeatedly with those data sets using Spark MLLib.
Do i have to do anything with Hadoop at this stage, because i am working
with a standalone mode.thank you.
BR,
Mahesh.

On Sun, Mar 13, 2016 at 6:30 PM, Maheshakya Wijewardena  wrote:

> Hi Mahesh,
>
> You don't have to look into carbon-ml.
>
> Best regards.
>
> On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi maheshakya,
>> i am working on some examples related to Spark and ML.is there anything
>> to do with carbon-ml. I think i dont need to look into that one.do i?
>> BR,
>> Mahesh
>>
>> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena <
>> mahesha...@wso2.com> wrote:
>>
>>> Hi Mahesh,
>>>
>>> does that Scala API is with your current product or repo?
>>>
>>>
>>> No, we don't have the Scala API included. What we want is to design the
>>> Java implementations of those algorithms to train with mini-batches of
>>> streaming data with the help of the aforementioned methods so that we can
>>> include in as a CEP extension.
>>>
>>> As to clarify, please try to write a simple Java program using Spark
>>> MLLib linear regression and k-means clustering with a sample data set (You
>>> can find alot of data sets from UCI repo[1]).  You need to break the
>>> dataset into several pieces and train a model repeatedly with those.
>>> After each training run, save the model information (such as weights,
>>> intercepts for regression and cluster centers for clustering - please check
>>> the arguments of those methods I have mentioned and save the required
>>> information of the model)
>>> When training a model we a new piece of data, use those methods to
>>> initialize and put the save values for the arguments. This way you can
>>> start from where you stopped in the previous run.
>>>
>>> Let us know your observations and feel free to ask if you need to know
>>> anything more on this.
>>>
>>> We'll let you know what needs to be done to include this in CEP.
>>>
>>> Best regards.
>>>
>>> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya <
>>> dananjayamah...@gmail.com> wrote:
>>>
 Hi Maheshakya,
 great.thank you.i already have ML and CEP and working more towards it.
 does that Scala API is with your current product or repo?.  thank you.
 BR,
 Mahesh.

 On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena <
 mahesha...@wso2.com> wrote:

> Hi Mahesh,
>
> Please find the comments inline.
>
> does data stream is taken to ML as the event publisher's format
>> through event publisher. Or  we can use direct traffic that comes to 
>> event
>> receiver, or else as streams
>>
> We intend to use the direct data as even streams.
>
> 1.) Those data coming from wso2 DAS to ML are coming as streams?
>>
> No, WSO2 ML doesn't use any even stream. The data stored in tables in
> DAS is loaded into ML.
>
> 2.) Are there any incremental learning algorithms currently active in
>> ML?you mentioned that there are and they are with scala API. So there is 
>> a
>> streaming support with that Scala API. In that API which format the data 
>> is
>> aquired to ML?
>>
> No, there are no incremental learning algorithms in ML. The scala API
> is about Spark MLLib. MLLib supports streaming k-means and other
> generalized linear models (linear regression variants and logistic
> regression) with Scala API. What they basically do in those 
> implementations
> is retraining the trained models with mini batches when data sequentially
> arrives. There, the breaking of streaming data into mini batches is done
> with the help of Spark Streaming. But we do not intend to use Spark
> streaming in our implementation. What we need to do is implement a similar
> behavior for event streams using the Java API.  The Java API has the
> following methods:
>
>- *createModel
>
> *
>(Vector
>
> 
>  weights,
>double intercept) - for GLMs
>- *setInitialModel
>
> *
>(KMeansModel
>
> 
>  model)
>- for K means
>
> With the help of these methods, we can train models again with newly
> arriving data, keeping the characteristics learned with the previous 

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-13 Thread Maheshakya Wijewardena
Hi Mahesh,

You don't have to look into carbon-ml.

Best regards.

On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya  wrote:

> Hi maheshakya,
> i am working on some examples related to Spark and ML.is there anything to
> do with carbon-ml. I think i dont need to look into that one.do i?
> BR,
> Mahesh
>
> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena <
> mahesha...@wso2.com> wrote:
>
>> Hi Mahesh,
>>
>> does that Scala API is with your current product or repo?
>>
>>
>> No, we don't have the Scala API included. What we want is to design the
>> Java implementations of those algorithms to train with mini-batches of
>> streaming data with the help of the aforementioned methods so that we can
>> include in as a CEP extension.
>>
>> As to clarify, please try to write a simple Java program using Spark
>> MLLib linear regression and k-means clustering with a sample data set (You
>> can find alot of data sets from UCI repo[1]).  You need to break the
>> dataset into several pieces and train a model repeatedly with those.
>> After each training run, save the model information (such as weights,
>> intercepts for regression and cluster centers for clustering - please check
>> the arguments of those methods I have mentioned and save the required
>> information of the model)
>> When training a model we a new piece of data, use those methods to
>> initialize and put the save values for the arguments. This way you can
>> start from where you stopped in the previous run.
>>
>> Let us know your observations and feel free to ask if you need to know
>> anything more on this.
>>
>> We'll let you know what needs to be done to include this in CEP.
>>
>> Best regards.
>>
>> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya <
>> dananjayamah...@gmail.com> wrote:
>>
>>> Hi Maheshakya,
>>> great.thank you.i already have ML and CEP and working more towards it.
>>> does that Scala API is with your current product or repo?.  thank you.
>>> BR,
>>> Mahesh.
>>>
>>> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena <
>>> mahesha...@wso2.com> wrote:
>>>
 Hi Mahesh,

 Please find the comments inline.

 does data stream is taken to ML as the event publisher's format through
> event publisher. Or  we can use direct traffic that comes to event
> receiver, or else as streams
>
 We intend to use the direct data as even streams.

 1.) Those data coming from wso2 DAS to ML are coming as streams?
>
 No, WSO2 ML doesn't use any even stream. The data stored in tables in
 DAS is loaded into ML.

 2.) Are there any incremental learning algorithms currently active in
> ML?you mentioned that there are and they are with scala API. So there is a
> streaming support with that Scala API. In that API which format the data 
> is
> aquired to ML?
>
 No, there are no incremental learning algorithms in ML. The scala API
 is about Spark MLLib. MLLib supports streaming k-means and other
 generalized linear models (linear regression variants and logistic
 regression) with Scala API. What they basically do in those implementations
 is retraining the trained models with mini batches when data sequentially
 arrives. There, the breaking of streaming data into mini batches is done
 with the help of Spark Streaming. But we do not intend to use Spark
 streaming in our implementation. What we need to do is implement a similar
 behavior for event streams using the Java API.  The Java API has the
 following methods:

- *createModel

 *
(Vector

 
  weights,
double intercept) - for GLMs
- *setInitialModel

 *
(KMeansModel

 
  model)
- for K means

 With the help of these methods, we can train models again with newly
 arriving data, keeping the characteristics learned with the previous data.
 When implementing this, we need to pay attention to other parameters of
 incremental learning such as data horizon and data obsolescence (indicated
 in the project ideas page).
 We need to discuss on how to add these with CEP event streams. I have
 added Suho into the thread for more clarification.

 Best regards.


 On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya <
 dananjayamah...@gmail.com> wrote:

> Hi maheshakya,
> as we concerned to use 

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-08 Thread Mahesh Dananjaya
Hi Maheshakya,
Thank you very much.i am already onto to that.will let you soon.thank you.
BR,
mahesh.

On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena  wrote:

> Hi Mahesh,
>
> does that Scala API is with your current product or repo?
>
>
> No, we don't have the Scala API included. What we want is to design the
> Java implementations of those algorithms to train with mini-batches of
> streaming data with the help of the aforementioned methods so that we can
> include in as a CEP extension.
>
> As to clarify, please try to write a simple Java program using Spark MLLib
> linear regression and k-means clustering with a sample data set (You can
> find alot of data sets from UCI repo[1]).  You need to break the dataset
> into several pieces and train a model repeatedly with those.
> After each training run, save the model information (such as weights,
> intercepts for regression and cluster centers for clustering - please check
> the arguments of those methods I have mentioned and save the required
> information of the model)
> When training a model we a new piece of data, use those methods to
> initialize and put the save values for the arguments. This way you can
> start from where you stopped in the previous run.
>
> Let us know your observations and feel free to ask if you need to know
> anything more on this.
>
> We'll let you know what needs to be done to include this in CEP.
>
> Best regards.
>
> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi Maheshakya,
>> great.thank you.i already have ML and CEP and working more towards it.
>> does that Scala API is with your current product or repo?.  thank you.
>> BR,
>> Mahesh.
>>
>> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena <
>> mahesha...@wso2.com> wrote:
>>
>>> Hi Mahesh,
>>>
>>> Please find the comments inline.
>>>
>>> does data stream is taken to ML as the event publisher's format through
 event publisher. Or  we can use direct traffic that comes to event
 receiver, or else as streams

>>> We intend to use the direct data as even streams.
>>>
>>> 1.) Those data coming from wso2 DAS to ML are coming as streams?

>>> No, WSO2 ML doesn't use any even stream. The data stored in tables in
>>> DAS is loaded into ML.
>>>
>>> 2.) Are there any incremental learning algorithms currently active in
 ML?you mentioned that there are and they are with scala API. So there is a
 streaming support with that Scala API. In that API which format the data is
 aquired to ML?

>>> No, there are no incremental learning algorithms in ML. The scala API is
>>> about Spark MLLib. MLLib supports streaming k-means and other generalized
>>> linear models (linear regression variants and logistic regression) with
>>> Scala API. What they basically do in those implementations is retraining
>>> the trained models with mini batches when data sequentially arrives. There,
>>> the breaking of streaming data into mini batches is done with the help of
>>> Spark Streaming. But we do not intend to use Spark streaming in our
>>> implementation. What we need to do is implement a similar behavior for
>>> event streams using the Java API.  The Java API has the following methods:
>>>
>>>- *createModel
>>>
>>> *
>>>(Vector
>>>
>>> 
>>>  weights,
>>>double intercept) - for GLMs
>>>- *setInitialModel
>>>
>>> *
>>>(KMeansModel
>>>
>>> 
>>>  model)
>>>- for K means
>>>
>>> With the help of these methods, we can train models again with newly
>>> arriving data, keeping the characteristics learned with the previous data.
>>> When implementing this, we need to pay attention to other parameters of
>>> incremental learning such as data horizon and data obsolescence (indicated
>>> in the project ideas page).
>>> We need to discuss on how to add these with CEP event streams. I have
>>> added Suho into the thread for more clarification.
>>>
>>> Best regards.
>>>
>>>
>>> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya <
>>> dananjayamah...@gmail.com> wrote:
>>>
 Hi maheshakya,
 as we concerned to use WSO2 CEP to handle streaming data and implement
 the machine learning algorithms with Spark MLLib, does data stream is taken
 to ML as the event publisher's format through event publisher. Or  we can
 use direct traffic that comes to event receiver, or else as streams.
 referring to https://docs.wso2.com/display/CEP410/User+Guide

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-07 Thread Maheshakya Wijewardena
Hi Mahesh,

does that Scala API is with your current product or repo?


No, we don't have the Scala API included. What we want is to design the
Java implementations of those algorithms to train with mini-batches of
streaming data with the help of the aforementioned methods so that we can
include in as a CEP extension.

As to clarify, please try to write a simple Java program using Spark MLLib
linear regression and k-means clustering with a sample data set (You can
find alot of data sets from UCI repo[1]).  You need to break the dataset
into several pieces and train a model repeatedly with those.
After each training run, save the model information (such as weights,
intercepts for regression and cluster centers for clustering - please check
the arguments of those methods I have mentioned and save the required
information of the model)
When training a model we a new piece of data, use those methods to
initialize and put the save values for the arguments. This way you can
start from where you stopped in the previous run.

Let us know your observations and feel free to ask if you need to know
anything more on this.

We'll let you know what needs to be done to include this in CEP.

Best regards.

On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya  wrote:

> Hi Maheshakya,
> great.thank you.i already have ML and CEP and working more towards it.
> does that Scala API is with your current product or repo?.  thank you.
> BR,
> Mahesh.
>
> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena <
> mahesha...@wso2.com> wrote:
>
>> Hi Mahesh,
>>
>> Please find the comments inline.
>>
>> does data stream is taken to ML as the event publisher's format through
>>> event publisher. Or  we can use direct traffic that comes to event
>>> receiver, or else as streams
>>>
>> We intend to use the direct data as even streams.
>>
>> 1.) Those data coming from wso2 DAS to ML are coming as streams?
>>>
>> No, WSO2 ML doesn't use any even stream. The data stored in tables in DAS
>> is loaded into ML.
>>
>> 2.) Are there any incremental learning algorithms currently active in
>>> ML?you mentioned that there are and they are with scala API. So there is a
>>> streaming support with that Scala API. In that API which format the data is
>>> aquired to ML?
>>>
>> No, there are no incremental learning algorithms in ML. The scala API is
>> about Spark MLLib. MLLib supports streaming k-means and other generalized
>> linear models (linear regression variants and logistic regression) with
>> Scala API. What they basically do in those implementations is retraining
>> the trained models with mini batches when data sequentially arrives. There,
>> the breaking of streaming data into mini batches is done with the help of
>> Spark Streaming. But we do not intend to use Spark streaming in our
>> implementation. What we need to do is implement a similar behavior for
>> event streams using the Java API.  The Java API has the following methods:
>>
>>- *createModel
>>
>> *
>>(Vector
>>
>> 
>>  weights,
>>double intercept) - for GLMs
>>- *setInitialModel
>>
>> *
>>(KMeansModel
>>
>> 
>>  model)
>>- for K means
>>
>> With the help of these methods, we can train models again with newly
>> arriving data, keeping the characteristics learned with the previous data.
>> When implementing this, we need to pay attention to other parameters of
>> incremental learning such as data horizon and data obsolescence (indicated
>> in the project ideas page).
>> We need to discuss on how to add these with CEP event streams. I have
>> added Suho into the thread for more clarification.
>>
>> Best regards.
>>
>>
>> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya <
>> dananjayamah...@gmail.com> wrote:
>>
>>> Hi maheshakya,
>>> as we concerned to use WSO2 CEP to handle streaming data and implement
>>> the machine learning algorithms with Spark MLLib, does data stream is taken
>>> to ML as the event publisher's format through event publisher. Or  we can
>>> use direct traffic that comes to event receiver, or else as streams.
>>> referring to https://docs.wso2.com/display/CEP410/User+Guide
>>> 1.) Those data coming from wso2 DAS to ML are coming as streams?
>>> 2.) Are there any incremental learning algorithms currently active
>>> in ML?you mentioned that there are and they are with scala API. So there is
>>> a streaming support with that Scala API. In that API which format the data
>>> is aquired to 

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-07 Thread Mahesh Dananjaya
Hi Maheshakya,
great.thank you.i already have ML and CEP and working more towards it. does
that Scala API is with your current product or repo?.  thank you.
BR,
Mahesh.

On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena 
wrote:

> Hi Mahesh,
>
> Please find the comments inline.
>
> does data stream is taken to ML as the event publisher's format through
>> event publisher. Or  we can use direct traffic that comes to event
>> receiver, or else as streams
>>
> We intend to use the direct data as even streams.
>
> 1.) Those data coming from wso2 DAS to ML are coming as streams?
>>
> No, WSO2 ML doesn't use any even stream. The data stored in tables in DAS
> is loaded into ML.
>
> 2.) Are there any incremental learning algorithms currently active in
>> ML?you mentioned that there are and they are with scala API. So there is a
>> streaming support with that Scala API. In that API which format the data is
>> aquired to ML?
>>
> No, there are no incremental learning algorithms in ML. The scala API is
> about Spark MLLib. MLLib supports streaming k-means and other generalized
> linear models (linear regression variants and logistic regression) with
> Scala API. What they basically do in those implementations is retraining
> the trained models with mini batches when data sequentially arrives. There,
> the breaking of streaming data into mini batches is done with the help of
> Spark Streaming. But we do not intend to use Spark streaming in our
> implementation. What we need to do is implement a similar behavior for
> event streams using the Java API.  The Java API has the following methods:
>
>- *createModel
>
> *
>(Vector
>
> 
>  weights,
>double intercept) - for GLMs
>- *setInitialModel
>
> *
>(KMeansModel
>
> 
>  model)
>- for K means
>
> With the help of these methods, we can train models again with newly
> arriving data, keeping the characteristics learned with the previous data.
> When implementing this, we need to pay attention to other parameters of
> incremental learning such as data horizon and data obsolescence (indicated
> in the project ideas page).
> We need to discuss on how to add these with CEP event streams. I have
> added Suho into the thread for more clarification.
>
> Best regards.
>
>
> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi maheshakya,
>> as we concerned to use WSO2 CEP to handle streaming data and implement
>> the machine learning algorithms with Spark MLLib, does data stream is taken
>> to ML as the event publisher's format through event publisher. Or  we can
>> use direct traffic that comes to event receiver, or else as streams.
>> referring to https://docs.wso2.com/display/CEP410/User+Guide
>> 1.) Those data coming from wso2 DAS to ML are coming as streams?
>> 2.) Are there any incremental learning algorithms currently active in
>> ML?you mentioned that there are and they are with scala API. So there is a
>> streaming support with that Scala API. In that API which format the data is
>> aquired to ML?
>>
>> thank you.
>> BR,
>> Mahesh.
>>
>> On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena <
>> mahesha...@wso2.com> wrote:
>>
>>> Hi Mahesh,
>>>
>>> We had to modify a the project scope a little to suit best for the
>>> requirements. We will update the project idea with those concerns soon and
>>> let you know.
>>>
>>> We do not support streaming data in WSO2 Machine learner at the moment.
>>> The new concern is to use WSO2 CEP to handle streaming data and implement
>>> the machine learning algorithms with Spark MLLib. You can look at the
>>> streaming k-means and streaming linear regression implementations in MLLib.
>>> Currently, the API is only for scala. Our need is to get the Java APIs of
>>> k-means and generalized linear models to support incremental learning with
>>> streaming data. This has to be done as mini-batch learning since these
>>> algorithms operates as stochastic gradient descents so that any learning
>>> with new data can be done on top of the previously learned models. So
>>> please go through the those APIs[1][2][3] and try to get an idea.
>>> Also please try to understand how event streams work in WSO2 CEP [4][5].
>>>
>>> Best regards.
>>>
>>> [1]
>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html
>>> [2]
>>> 

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-06 Thread Maheshakya Wijewardena
Hi Mahesh,

Please find the comments inline.

does data stream is taken to ML as the event publisher's format through
> event publisher. Or  we can use direct traffic that comes to event
> receiver, or else as streams
>
We intend to use the direct data as even streams.

1.) Those data coming from wso2 DAS to ML are coming as streams?
>
No, WSO2 ML doesn't use any even stream. The data stored in tables in DAS
is loaded into ML.

2.) Are there any incremental learning algorithms currently active in
> ML?you mentioned that there are and they are with scala API. So there is a
> streaming support with that Scala API. In that API which format the data is
> aquired to ML?
>
No, there are no incremental learning algorithms in ML. The scala API is
about Spark MLLib. MLLib supports streaming k-means and other generalized
linear models (linear regression variants and logistic regression) with
Scala API. What they basically do in those implementations is retraining
the trained models with mini batches when data sequentially arrives. There,
the breaking of streaming data into mini batches is done with the help of
Spark Streaming. But we do not intend to use Spark streaming in our
implementation. What we need to do is implement a similar behavior for
event streams using the Java API.  The Java API has the following methods:

   - *createModel
   
*
   (Vector
   

weights,
   double intercept) - for GLMs
   - *setInitialModel
   
*
   (KMeansModel
   

model)
   - for K means

With the help of these methods, we can train models again with newly
arriving data, keeping the characteristics learned with the previous data.
When implementing this, we need to pay attention to other parameters of
incremental learning such as data horizon and data obsolescence (indicated
in the project ideas page).
We need to discuss on how to add these with CEP event streams. I have added
Suho into the thread for more clarification.

Best regards.


On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya 
wrote:

> Hi maheshakya,
> as we concerned to use WSO2 CEP to handle streaming data and implement the
> machine learning algorithms with Spark MLLib, does data stream is taken to
> ML as the event publisher's format through event publisher. Or  we can use
> direct traffic that comes to event receiver, or else as streams. referring
> to https://docs.wso2.com/display/CEP410/User+Guide
> 1.) Those data coming from wso2 DAS to ML are coming as streams?
> 2.) Are there any incremental learning algorithms currently active in
> ML?you mentioned that there are and they are with scala API. So there is a
> streaming support with that Scala API. In that API which format the data is
> aquired to ML?
>
> thank you.
> BR,
> Mahesh.
>
> On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena <
> mahesha...@wso2.com> wrote:
>
>> Hi Mahesh,
>>
>> We had to modify a the project scope a little to suit best for the
>> requirements. We will update the project idea with those concerns soon and
>> let you know.
>>
>> We do not support streaming data in WSO2 Machine learner at the moment.
>> The new concern is to use WSO2 CEP to handle streaming data and implement
>> the machine learning algorithms with Spark MLLib. You can look at the
>> streaming k-means and streaming linear regression implementations in MLLib.
>> Currently, the API is only for scala. Our need is to get the Java APIs of
>> k-means and generalized linear models to support incremental learning with
>> streaming data. This has to be done as mini-batch learning since these
>> algorithms operates as stochastic gradient descents so that any learning
>> with new data can be done on top of the previously learned models. So
>> please go through the those APIs[1][2][3] and try to get an idea.
>> Also please try to understand how event streams work in WSO2 CEP [4][5].
>>
>> Best regards.
>>
>> [1]
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html
>> [2]
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html
>> [3]
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html
>> [4] https://docs.wso2.com/display/CEP310/Working+with+Event+Streams
>> [5] https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans
>>
>> On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya <
>> dananjayamah...@gmail.com> wrote:
>>
>>> Hi 

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-05 Thread Mahesh Dananjaya
Hi maheshakya,
as we concerned to use WSO2 CEP to handle streaming data and implement the
machine learning algorithms with Spark MLLib, does data stream is taken to
ML as the event publisher's format through event publisher. Or  we can use
direct traffic that comes to event receiver, or else as streams. referring
to https://docs.wso2.com/display/CEP410/User+Guide
1.) Those data coming from wso2 DAS to ML are coming as streams?
2.) Are there any incremental learning algorithms currently active in
ML?you mentioned that there are and they are with scala API. So there is a
streaming support with that Scala API. In that API which format the data is
aquired to ML?

thank you.
BR,
Mahesh.

On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena 
wrote:

> Hi Mahesh,
>
> We had to modify a the project scope a little to suit best for the
> requirements. We will update the project idea with those concerns soon and
> let you know.
>
> We do not support streaming data in WSO2 Machine learner at the moment.
> The new concern is to use WSO2 CEP to handle streaming data and implement
> the machine learning algorithms with Spark MLLib. You can look at the
> streaming k-means and streaming linear regression implementations in MLLib.
> Currently, the API is only for scala. Our need is to get the Java APIs of
> k-means and generalized linear models to support incremental learning with
> streaming data. This has to be done as mini-batch learning since these
> algorithms operates as stochastic gradient descents so that any learning
> with new data can be done on top of the previously learned models. So
> please go through the those APIs[1][2][3] and try to get an idea.
> Also please try to understand how event streams work in WSO2 CEP [4][5].
>
> Best regards.
>
> [1]
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html
> [2]
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html
> [3]
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html
> [4] https://docs.wso2.com/display/CEP310/Working+with+Event+Streams
> [5] https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans
>
> On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi maheshakya,
>> give me sometime to go through your ML package. Do current product have
>> any stream data support?. i did some university projects related to machine
>> learning with regressions,modelling, factor analysis, cluster analysis and
>> classification problems (Discriminant Analysis) with SVM (Support Vector
>> machines), Neural networks, LS classification and ML(Maximum likelihood).
>> give me sometime to see how wso2 architecture works.then i can come up with
>> good architecture.thank you.
>> BR,
>> Mahesh.
>>
>> On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya <
>> dananjayamah...@gmail.com> wrote:
>>
>>> Hi Maheshakya,
>>> Thank you for the resources. I will go through this and looking forward
>>> to this proposed project.Thank you.
>>> BR,
>>> Mahesh.
>>>
>>> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya Wijewardena <
>>> mahesha...@wso2.com> wrote:
>>>
 Hi Mahesh,

 Thank you for the interest for this project.

 We would like to know what type of similar projects you have worked on.
 You may have seen that WSO2 Machine Learner supports several learning
 algorithms at the moment[1]. This project intends to leverage the existing
 algorithms in WSO2 Machine Learner to support streaming data. As an
 initiative, first you can get an idea about what WSO2 Machine Learner does
 and how it operates. You can download WSO2 Machine Learner from product
 page[2] and the the source code [3]. ML is using Apache Spark MLLib[4] for
 its' algorithms so it's better to read and understand what it does as well.

 In order to get an idea about the deliverables and the scope of this
 project, try to understand how Spark streaming[5] (see examples) handles
 streaming data. Also, have a look in the streaming algorithms[6][7]
 supported by MLLib. There are two approaches discussed to employ
 incremental learning in ML in the project proposals page. These streaming
 algorithms can be directly used in the first approach. For the other
 approach, the your implementation should contain a procedure to create mini
 batches from streaming data with relevant sizes (i.e. a moving window) and
 do periodic retraining of the same algorithm.

 To start with the project, you will need to come up with a suitable
 plan and an architecture first.

 Please watch the video referenced in the proposal (reference: 5). It
 will help you getting a better idea about machine learning algorithms with
 streaming data.

 Let us know if you need any help with these.

 Best regards

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-04 Thread Maheshakya Wijewardena
Hi Mahesh,

We had to modify a the project scope a little to suit best for the
requirements. We will update the project idea with those concerns soon and
let you know.

We do not support streaming data in WSO2 Machine learner at the moment. The
new concern is to use WSO2 CEP to handle streaming data and implement the
machine learning algorithms with Spark MLLib. You can look at the streaming
k-means and streaming linear regression implementations in MLLib.
Currently, the API is only for scala. Our need is to get the Java APIs of
k-means and generalized linear models to support incremental learning with
streaming data. This has to be done as mini-batch learning since these
algorithms operates as stochastic gradient descents so that any learning
with new data can be done on top of the previously learned models. So
please go through the those APIs[1][2][3] and try to get an idea.
Also please try to understand how event streams work in WSO2 CEP [4][5].

Best regards.

[1]
http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html
[2]
http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html
[3]
http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html
[4] https://docs.wso2.com/display/CEP310/Working+with+Event+Streams
[5] https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans

On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya  wrote:

> Hi maheshakya,
> give me sometime to go through your ML package. Do current product have
> any stream data support?. i did some university projects related to machine
> learning with regressions,modelling, factor analysis, cluster analysis and
> classification problems (Discriminant Analysis) with SVM (Support Vector
> machines), Neural networks, LS classification and ML(Maximum likelihood).
> give me sometime to see how wso2 architecture works.then i can come up with
> good architecture.thank you.
> BR,
> Mahesh.
>
> On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi Maheshakya,
>> Thank you for the resources. I will go through this and looking forward
>> to this proposed project.Thank you.
>> BR,
>> Mahesh.
>>
>> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya Wijewardena <
>> mahesha...@wso2.com> wrote:
>>
>>> Hi Mahesh,
>>>
>>> Thank you for the interest for this project.
>>>
>>> We would like to know what type of similar projects you have worked on.
>>> You may have seen that WSO2 Machine Learner supports several learning
>>> algorithms at the moment[1]. This project intends to leverage the existing
>>> algorithms in WSO2 Machine Learner to support streaming data. As an
>>> initiative, first you can get an idea about what WSO2 Machine Learner does
>>> and how it operates. You can download WSO2 Machine Learner from product
>>> page[2] and the the source code [3]. ML is using Apache Spark MLLib[4] for
>>> its' algorithms so it's better to read and understand what it does as well.
>>>
>>> In order to get an idea about the deliverables and the scope of this
>>> project, try to understand how Spark streaming[5] (see examples) handles
>>> streaming data. Also, have a look in the streaming algorithms[6][7]
>>> supported by MLLib. There are two approaches discussed to employ
>>> incremental learning in ML in the project proposals page. These streaming
>>> algorithms can be directly used in the first approach. For the other
>>> approach, the your implementation should contain a procedure to create mini
>>> batches from streaming data with relevant sizes (i.e. a moving window) and
>>> do periodic retraining of the same algorithm.
>>>
>>> To start with the project, you will need to come up with a suitable plan
>>> and an architecture first.
>>>
>>> Please watch the video referenced in the proposal (reference: 5). It
>>> will help you getting a better idea about machine learning algorithms with
>>> streaming data.
>>>
>>> Let us know if you need any help with these.
>>>
>>> Best regards
>>>
>>> [1] https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms
>>> [2] http://wso2.com/products/machine-learner/
>>> [3]
>>> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout
>>> [4] https://spark.apache.org/docs/1.4.1/mllib-guide.html
>>> [5] https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html
>>> [6]
>>> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression
>>> [7]
>>> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means
>>>
>>> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya <
>>> dananjayamah...@gmail.com> wrote:
>>>
 Hi all,
 I am interesting on contribute to proposal 6: "Predictive analytic with
 online data for WSO2 Machine Learner" for GSOC2 this time. Since i have
 been engaging with some similar projects i think it will be 

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-03 Thread Mahesh Dananjaya
Hi maheshakya,
give me sometime to go through your ML package. Do current product have any
stream data support?. i did some university projects related to machine
learning with regressions,modelling, factor analysis, cluster analysis and
classification problems (Discriminant Analysis) with SVM (Support Vector
machines), Neural networks, LS classification and ML(Maximum likelihood).
give me sometime to see how wso2 architecture works.then i can come up with
good architecture.thank you.
BR,
Mahesh.

On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya 
wrote:

> Hi Maheshakya,
> Thank you for the resources. I will go through this and looking forward to
> this proposed project.Thank you.
> BR,
> Mahesh.
>
> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya Wijewardena <
> mahesha...@wso2.com> wrote:
>
>> Hi Mahesh,
>>
>> Thank you for the interest for this project.
>>
>> We would like to know what type of similar projects you have worked on.
>> You may have seen that WSO2 Machine Learner supports several learning
>> algorithms at the moment[1]. This project intends to leverage the existing
>> algorithms in WSO2 Machine Learner to support streaming data. As an
>> initiative, first you can get an idea about what WSO2 Machine Learner does
>> and how it operates. You can download WSO2 Machine Learner from product
>> page[2] and the the source code [3]. ML is using Apache Spark MLLib[4] for
>> its' algorithms so it's better to read and understand what it does as well.
>>
>> In order to get an idea about the deliverables and the scope of this
>> project, try to understand how Spark streaming[5] (see examples) handles
>> streaming data. Also, have a look in the streaming algorithms[6][7]
>> supported by MLLib. There are two approaches discussed to employ
>> incremental learning in ML in the project proposals page. These streaming
>> algorithms can be directly used in the first approach. For the other
>> approach, the your implementation should contain a procedure to create mini
>> batches from streaming data with relevant sizes (i.e. a moving window) and
>> do periodic retraining of the same algorithm.
>>
>> To start with the project, you will need to come up with a suitable plan
>> and an architecture first.
>>
>> Please watch the video referenced in the proposal (reference: 5). It will
>> help you getting a better idea about machine learning algorithms with
>> streaming data.
>>
>> Let us know if you need any help with these.
>>
>> Best regards
>>
>> [1] https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms
>> [2] http://wso2.com/products/machine-learner/
>> [3]
>> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout
>> [4] https://spark.apache.org/docs/1.4.1/mllib-guide.html
>> [5] https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html
>> [6]
>> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression
>> [7]
>> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means
>>
>> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya <
>> dananjayamah...@gmail.com> wrote:
>>
>>> Hi all,
>>> I am interesting on contribute to proposal 6: "Predictive analytic with
>>> online data for WSO2 Machine Learner" for GSOC2 this time. Since i have
>>> been engaging with some similar projects i think it will be a great
>>> experience for me. Please let me know what you think and what you suggest.
>>> I have been going through your documents.thank you.
>>> regards,
>>> Mahesh Dananjaya.
>>>
>>>
>>> ___
>>> Dev mailing list
>>> Dev@wso2.org
>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>
>>>
>>
>>
>> --
>> Pruthuvi Maheshakya Wijewardena
>> mahesha...@wso2.com
>> +94711228855
>>
>>
>>
>
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev


Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-02 Thread Mahesh Dananjaya
Hi Maheshakya,
Thank you for the resources. I will go through this and looking forward to
this proposed project.Thank you.
BR,
Mahesh.

On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya Wijewardena 
wrote:

> Hi Mahesh,
>
> Thank you for the interest for this project.
>
> We would like to know what type of similar projects you have worked on.
> You may have seen that WSO2 Machine Learner supports several learning
> algorithms at the moment[1]. This project intends to leverage the existing
> algorithms in WSO2 Machine Learner to support streaming data. As an
> initiative, first you can get an idea about what WSO2 Machine Learner does
> and how it operates. You can download WSO2 Machine Learner from product
> page[2] and the the source code [3]. ML is using Apache Spark MLLib[4] for
> its' algorithms so it's better to read and understand what it does as well.
>
> In order to get an idea about the deliverables and the scope of this
> project, try to understand how Spark streaming[5] (see examples) handles
> streaming data. Also, have a look in the streaming algorithms[6][7]
> supported by MLLib. There are two approaches discussed to employ
> incremental learning in ML in the project proposals page. These streaming
> algorithms can be directly used in the first approach. For the other
> approach, the your implementation should contain a procedure to create mini
> batches from streaming data with relevant sizes (i.e. a moving window) and
> do periodic retraining of the same algorithm.
>
> To start with the project, you will need to come up with a suitable plan
> and an architecture first.
>
> Please watch the video referenced in the proposal (reference: 5). It will
> help you getting a better idea about machine learning algorithms with
> streaming data.
>
> Let us know if you need any help with these.
>
> Best regards
>
> [1] https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms
> [2] http://wso2.com/products/machine-learner/
> [3]
> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout
> [4] https://spark.apache.org/docs/1.4.1/mllib-guide.html
> [5] https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html
> [6]
> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression
> [7]
> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means
>
> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi all,
>> I am interesting on contribute to proposal 6: "Predictive analytic with
>> online data for WSO2 Machine Learner" for GSOC2 this time. Since i have
>> been engaging with some similar projects i think it will be a great
>> experience for me. Please let me know what you think and what you suggest.
>> I have been going through your documents.thank you.
>> regards,
>> Mahesh Dananjaya.
>>
>>
>> ___
>> Dev mailing list
>> Dev@wso2.org
>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>
>>
>
>
> --
> Pruthuvi Maheshakya Wijewardena
> mahesha...@wso2.com
> +94711228855
>
>
>
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev


Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-02 Thread Maheshakya Wijewardena
Hi Mahesh,

Thank you for the interest for this project.

We would like to know what type of similar projects you have worked on. You
may have seen that WSO2 Machine Learner supports several learning
algorithms at the moment[1]. This project intends to leverage the existing
algorithms in WSO2 Machine Learner to support streaming data. As an
initiative, first you can get an idea about what WSO2 Machine Learner does
and how it operates. You can download WSO2 Machine Learner from product
page[2] and the the source code [3]. ML is using Apache Spark MLLib[4] for
its' algorithms so it's better to read and understand what it does as well.

In order to get an idea about the deliverables and the scope of this
project, try to understand how Spark streaming[5] (see examples) handles
streaming data. Also, have a look in the streaming algorithms[6][7]
supported by MLLib. There are two approaches discussed to employ
incremental learning in ML in the project proposals page. These streaming
algorithms can be directly used in the first approach. For the other
approach, the your implementation should contain a procedure to create mini
batches from streaming data with relevant sizes (i.e. a moving window) and
do periodic retraining of the same algorithm.

To start with the project, you will need to come up with a suitable plan
and an architecture first.

Please watch the video referenced in the proposal (reference: 5). It will
help you getting a better idea about machine learning algorithms with
streaming data.

Let us know if you need any help with these.

Best regards

[1] https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms
[2] http://wso2.com/products/machine-learner/
[3]
https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout
[4] https://spark.apache.org/docs/1.4.1/mllib-guide.html
[5] https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html
[6]
https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression
[7]
https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means

On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya 
wrote:

> Hi all,
> I am interesting on contribute to proposal 6: "Predictive analytic with
> online data for WSO2 Machine Learner" for GSOC2 this time. Since i have
> been engaging with some similar projects i think it will be a great
> experience for me. Please let me know what you think and what you suggest.
> I have been going through your documents.thank you.
> regards,
> Mahesh Dananjaya.
>
>
> ___
> Dev mailing list
> Dev@wso2.org
> http://wso2.org/cgi-bin/mailman/listinfo/dev
>
>


-- 
Pruthuvi Maheshakya Wijewardena
mahesha...@wso2.com
+94711228855
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev


[Dev] Fwd: GSOC2016: Proposal 6: [ML]

2016-03-01 Thread Mahesh Dananjaya
Hi all,
I am interesting on contribute to proposal 6: "Predictive analytic with
online data for WSO2 Machine Learner" for GSOC2 this time. Since i have
been engaging with some similar projects i think it will be a great
experience for me. Please let me know what you think and what you suggest.
I have been going through your documents.thank you.
regards,
Mahesh Dananjaya.
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev