Hi Mahesh,

Thank you for the interest for this project.

We would like to know what type of similar projects you have worked on. You
may have seen that WSO2 Machine Learner supports several learning
algorithms at the moment[1]. This project intends to leverage the existing
algorithms in WSO2 Machine Learner to support streaming data. As an
initiative, first you can get an idea about what WSO2 Machine Learner does
and how it operates. You can download WSO2 Machine Learner from product
page[2] and the the source code [3]. ML is using Apache Spark MLLib[4] for
its' algorithms so it's better to read and understand what it does as well.

In order to get an idea about the deliverables and the scope of this
project, try to understand how Spark streaming[5] (see examples) handles
streaming data. Also, have a look in the streaming algorithms[6][7]
supported by MLLib. There are two approaches discussed to employ
incremental learning in ML in the project proposals page. These streaming
algorithms can be directly used in the first approach. For the other
approach, the your implementation should contain a procedure to create mini
batches from streaming data with relevant sizes (i.e. a moving window) and
do periodic retraining of the same algorithm.

To start with the project, you will need to come up with a suitable plan
and an architecture first.

Please watch the video referenced in the proposal (reference: 5). It will
help you getting a better idea about machine learning algorithms with
streaming data.

Let us know if you need any help with these.

Best regards

[1] https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms
[2] http://wso2.com/products/machine-learner/
[3]
https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout
[4] https://spark.apache.org/docs/1.4.1/mllib-guide.html
[5] https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html
[6]
https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression
[7]
https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means

On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya <dananjayamah...@gmail.com>
wrote:

> Hi all,
> I am interesting on contribute to proposal 6: "Predictive analytic with
> online data for WSO2 Machine Learner" for GSOC2 this time. Since i have
> been engaging with some similar projects i think it will be a great
> experience for me. Please let me know what you think and what you suggest.
> I have been going through your documents.thank you.
> regards,
> Mahesh Dananjaya.
>
>
> _______________________________________________
> Dev mailing list
> Dev@wso2.org
> http://wso2.org/cgi-bin/mailman/listinfo/dev
>
>


-- 
Pruthuvi Maheshakya Wijewardena
mahesha...@wso2.com
+94711228855
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to