Hi Mahesh, Please submit your final proposal to GSoC, before the deadline.
Regards, Supun On Mon, Mar 21, 2016 at 1:00 PM, Maheshakya Wijewardena <mahesha...@wso2.com > wrote: > Hi Mahesh, > > The deadline for submitting your proposals is on March 25th, 2016, > therefore please start writing the proposal and get feedback. > > Best regards. > > On Tue, Mar 15, 2016 at 4:14 PM, Mahesh Dananjaya < > dananjayamah...@gmail.com> wrote: > >> Hi Maheshakaya, >> Ok.I have been trying some examples and try to split them and train >> incrementally. Still doing that. i have been adding them to my github repo >> too. https://github.com/dananjayamahesh/GSOC2016 . i saw that there is >> only scala API support for those streaming algorithms in Spark. so my task >> is to develop Java API. will let you nkow my progress.thank you very much. >> BR, >> Mahesh >> >> On Tue, Mar 15, 2016 at 3:21 PM, Maheshakya Wijewardena < >> mahesha...@wso2.com> wrote: >> >>> Hi Mahesh, >>> >>> No you don't need to use Hadoop at any stage in this project. Everything >>> you need is in Spark (regarding ML algorithms). >>> You can also use Spark MLLibs methods to randomly split datasets. >>> >>> Best regards. >>> >>> On Mon, Mar 14, 2016 at 1:28 PM, Mahesh Dananjaya < >>> dananjayamah...@gmail.com> wrote: >>> >>>> Hi Maheshakya, >>>> I am writing some java programs and try to break the dataset into >>>> several pieces and train a model repeatedly with those data sets using >>>> Spark MLLib. Do i have to do anything with Hadoop at this stage, because i >>>> am working with a standalone mode.thank you. >>>> BR, >>>> Mahesh. >>>> >>>> On Sun, Mar 13, 2016 at 6:30 PM, Maheshakya Wijewardena < >>>> mahesha...@wso2.com> wrote: >>>> >>>>> Hi Mahesh, >>>>> >>>>> You don't have to look into carbon-ml. >>>>> >>>>> Best regards. >>>>> >>>>> On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya < >>>>> dananjayamah...@gmail.com> wrote: >>>>> >>>>>> Hi maheshakya, >>>>>> i am working on some examples related to Spark and ML.is there >>>>>> anything to do with carbon-ml. I think i dont need to look into that >>>>>> one.do >>>>>> i? >>>>>> BR, >>>>>> Mahesh >>>>>> >>>>>> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena < >>>>>> mahesha...@wso2.com> wrote: >>>>>> >>>>>>> Hi Mahesh, >>>>>>> >>>>>>> does that Scala API is with your current product or repo? >>>>>>> >>>>>>> >>>>>>> No, we don't have the Scala API included. What we want is to design >>>>>>> the Java implementations of those algorithms to train with mini-batches >>>>>>> of >>>>>>> streaming data with the help of the aforementioned methods so that we >>>>>>> can >>>>>>> include in as a CEP extension. >>>>>>> >>>>>>> As to clarify, please try to write a simple Java program using Spark >>>>>>> MLLib linear regression and k-means clustering with a sample data set >>>>>>> (You >>>>>>> can find alot of data sets from UCI repo[1]). You need to break the >>>>>>> dataset into several pieces and train a model repeatedly with those. >>>>>>> After each training run, save the model information (such as >>>>>>> weights, intercepts for regression and cluster centers for clustering - >>>>>>> please check the arguments of those methods I have mentioned and save >>>>>>> the >>>>>>> required information of the model) >>>>>>> When training a model we a new piece of data, use those methods to >>>>>>> initialize and put the save values for the arguments. This way you can >>>>>>> start from where you stopped in the previous run. >>>>>>> >>>>>>> Let us know your observations and feel free to ask if you need to >>>>>>> know anything more on this. >>>>>>> >>>>>>> We'll let you know what needs to be done to include this in CEP. >>>>>>> >>>>>>> Best regards. >>>>>>> >>>>>>> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya < >>>>>>> dananjayamah...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Maheshakya, >>>>>>>> great.thank you.i already have ML and CEP and working more towards >>>>>>>> it. does that Scala API is with your current product or repo?. thank >>>>>>>> you. >>>>>>>> BR, >>>>>>>> Mahesh. >>>>>>>> >>>>>>>> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena < >>>>>>>> mahesha...@wso2.com> wrote: >>>>>>>> >>>>>>>>> Hi Mahesh, >>>>>>>>> >>>>>>>>> Please find the comments inline. >>>>>>>>> >>>>>>>>> does data stream is taken to ML as the event publisher's format >>>>>>>>>> through event publisher. Or we can use direct traffic that comes to >>>>>>>>>> event >>>>>>>>>> receiver, or else as streams >>>>>>>>>> >>>>>>>>> We intend to use the direct data as even streams. >>>>>>>>> >>>>>>>>> 1.) Those data coming from wso2 DAS to ML are coming as streams? >>>>>>>>>> >>>>>>>>> No, WSO2 ML doesn't use any even stream. The data stored in tables >>>>>>>>> in DAS is loaded into ML. >>>>>>>>> >>>>>>>>> 2.) Are there any incremental learning algorithms currently active >>>>>>>>>> in ML?you mentioned that there are and they are with scala API. So >>>>>>>>>> there is >>>>>>>>>> a streaming support with that Scala API. In that API which format >>>>>>>>>> the data >>>>>>>>>> is aquired to ML? >>>>>>>>>> >>>>>>>>> No, there are no incremental learning algorithms in ML. The scala >>>>>>>>> API is about Spark MLLib. MLLib supports streaming k-means and other >>>>>>>>> generalized linear models (linear regression variants and logistic >>>>>>>>> regression) with Scala API. What they basically do in those >>>>>>>>> implementations >>>>>>>>> is retraining the trained models with mini batches when data >>>>>>>>> sequentially >>>>>>>>> arrives. There, the breaking of streaming data into mini batches is >>>>>>>>> done >>>>>>>>> with the help of Spark Streaming. But we do not intend to use Spark >>>>>>>>> streaming in our implementation. What we need to do is implement a >>>>>>>>> similar >>>>>>>>> behavior for event streams using the Java API. The Java API has the >>>>>>>>> following methods: >>>>>>>>> >>>>>>>>> - *createModel >>>>>>>>> >>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html#createModel%28org.apache.spark.mllib.linalg.Vector,%20double%29>* >>>>>>>>> (Vector >>>>>>>>> >>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html> >>>>>>>>> weights, >>>>>>>>> double intercept) - for GLMs >>>>>>>>> - *setInitialModel >>>>>>>>> >>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html#setInitialModel%28org.apache.spark.mllib.clustering.KMeansModel%29>* >>>>>>>>> (KMeansModel >>>>>>>>> >>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeansModel.html> >>>>>>>>> model) >>>>>>>>> - for K means >>>>>>>>> >>>>>>>>> With the help of these methods, we can train models again with >>>>>>>>> newly arriving data, keeping the characteristics learned with the >>>>>>>>> previous >>>>>>>>> data. When implementing this, we need to pay attention to other >>>>>>>>> parameters >>>>>>>>> of incremental learning such as data horizon and data obsolescence >>>>>>>>> (indicated in the project ideas page). >>>>>>>>> We need to discuss on how to add these with CEP event streams. I >>>>>>>>> have added Suho into the thread for more clarification. >>>>>>>>> >>>>>>>>> Best regards. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya < >>>>>>>>> dananjayamah...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi maheshakya, >>>>>>>>>> as we concerned to use WSO2 CEP to handle streaming data and >>>>>>>>>> implement the machine learning algorithms with Spark MLLib, does data >>>>>>>>>> stream is taken to ML as the event publisher's format through event >>>>>>>>>> publisher. Or we can use direct traffic that comes to event >>>>>>>>>> receiver, or >>>>>>>>>> else as streams. referring to >>>>>>>>>> https://docs.wso2.com/display/CEP410/User+Guide >>>>>>>>>> 1.) Those data coming from wso2 DAS to ML are coming as >>>>>>>>>> streams? >>>>>>>>>> 2.) Are there any incremental learning algorithms currently >>>>>>>>>> active in ML?you mentioned that there are and they are with scala >>>>>>>>>> API. So >>>>>>>>>> there is a streaming support with that Scala API. In that API which >>>>>>>>>> format >>>>>>>>>> the data is aquired to ML? >>>>>>>>>> >>>>>>>>>> thank you. >>>>>>>>>> BR, >>>>>>>>>> Mahesh. >>>>>>>>>> >>>>>>>>>> On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena < >>>>>>>>>> mahesha...@wso2.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Mahesh, >>>>>>>>>>> >>>>>>>>>>> We had to modify a the project scope a little to suit best for >>>>>>>>>>> the requirements. We will update the project idea with those >>>>>>>>>>> concerns soon >>>>>>>>>>> and let you know. >>>>>>>>>>> >>>>>>>>>>> We do not support streaming data in WSO2 Machine learner at the >>>>>>>>>>> moment. The new concern is to use WSO2 CEP to handle streaming data >>>>>>>>>>> and >>>>>>>>>>> implement the machine learning algorithms with Spark MLLib. You can >>>>>>>>>>> look at >>>>>>>>>>> the streaming k-means and streaming linear regression >>>>>>>>>>> implementations in >>>>>>>>>>> MLLib. Currently, the API is only for scala. Our need is to get the >>>>>>>>>>> Java >>>>>>>>>>> APIs of k-means and generalized linear models to support incremental >>>>>>>>>>> learning with streaming data. This has to be done as mini-batch >>>>>>>>>>> learning >>>>>>>>>>> since these algorithms operates as stochastic gradient descents so >>>>>>>>>>> that any >>>>>>>>>>> learning with new data can be done on top of the previously learned >>>>>>>>>>> models. >>>>>>>>>>> So please go through the those APIs[1][2][3] and try to get an idea. >>>>>>>>>>> Also please try to understand how event streams work in WSO2 CEP >>>>>>>>>>> [4][5]. >>>>>>>>>>> >>>>>>>>>>> Best regards. >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html >>>>>>>>>>> [2] >>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html >>>>>>>>>>> [3] >>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html >>>>>>>>>>> [4] >>>>>>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Event+Streams >>>>>>>>>>> [5] >>>>>>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans >>>>>>>>>>> >>>>>>>>>>> On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya < >>>>>>>>>>> dananjayamah...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi maheshakya, >>>>>>>>>>>> give me sometime to go through your ML package. Do current >>>>>>>>>>>> product have any stream data support?. i did some university >>>>>>>>>>>> projects >>>>>>>>>>>> related to machine learning with regressions,modelling, factor >>>>>>>>>>>> analysis, >>>>>>>>>>>> cluster analysis and classification problems (Discriminant >>>>>>>>>>>> Analysis) with >>>>>>>>>>>> SVM (Support Vector machines), Neural networks, LS classification >>>>>>>>>>>> and >>>>>>>>>>>> ML(Maximum likelihood). give me sometime to see how wso2 >>>>>>>>>>>> architecture >>>>>>>>>>>> works.then i can come up with good architecture.thank you. >>>>>>>>>>>> BR, >>>>>>>>>>>> Mahesh. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya < >>>>>>>>>>>> dananjayamah...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Maheshakya, >>>>>>>>>>>>> Thank you for the resources. I will go through this and >>>>>>>>>>>>> looking forward to this proposed project.Thank you. >>>>>>>>>>>>> BR, >>>>>>>>>>>>> Mahesh. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya Wijewardena < >>>>>>>>>>>>> mahesha...@wso2.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Mahesh, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you for the interest for this project. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We would like to know what type of similar projects you have >>>>>>>>>>>>>> worked on. You may have seen that WSO2 Machine Learner supports >>>>>>>>>>>>>> several >>>>>>>>>>>>>> learning algorithms at the moment[1]. This project intends to >>>>>>>>>>>>>> leverage the >>>>>>>>>>>>>> existing algorithms in WSO2 Machine Learner to support streaming >>>>>>>>>>>>>> data. As >>>>>>>>>>>>>> an initiative, first you can get an idea about what WSO2 Machine >>>>>>>>>>>>>> Learner >>>>>>>>>>>>>> does and how it operates. You can download WSO2 Machine Learner >>>>>>>>>>>>>> from >>>>>>>>>>>>>> product page[2] and the the source code [3]. ML is using Apache >>>>>>>>>>>>>> Spark >>>>>>>>>>>>>> MLLib[4] for its' algorithms so it's better to read and >>>>>>>>>>>>>> understand what it >>>>>>>>>>>>>> does as well. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In order to get an idea about the deliverables and the scope >>>>>>>>>>>>>> of this project, try to understand how Spark streaming[5] (see >>>>>>>>>>>>>> examples) >>>>>>>>>>>>>> handles streaming data. Also, have a look in the streaming >>>>>>>>>>>>>> algorithms[6][7] >>>>>>>>>>>>>> supported by MLLib. There are two approaches discussed to employ >>>>>>>>>>>>>> incremental learning in ML in the project proposals page. These >>>>>>>>>>>>>> streaming >>>>>>>>>>>>>> algorithms can be directly used in the first approach. For the >>>>>>>>>>>>>> other >>>>>>>>>>>>>> approach, the your implementation should contain a procedure to >>>>>>>>>>>>>> create mini >>>>>>>>>>>>>> batches from streaming data with relevant sizes (i.e. a moving >>>>>>>>>>>>>> window) and >>>>>>>>>>>>>> do periodic retraining of the same algorithm. >>>>>>>>>>>>>> >>>>>>>>>>>>>> To start with the project, you will need to come up with a >>>>>>>>>>>>>> suitable plan and an architecture first. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please watch the video referenced in the proposal (reference: >>>>>>>>>>>>>> 5). It will help you getting a better idea about machine learning >>>>>>>>>>>>>> algorithms with streaming data. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Let us know if you need any help with these. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] >>>>>>>>>>>>>> https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms >>>>>>>>>>>>>> [2] http://wso2.com/products/machine-learner/ >>>>>>>>>>>>>> [3] >>>>>>>>>>>>>> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout >>>>>>>>>>>>>> [4] https://spark.apache.org/docs/1.4.1/mllib-guide.html >>>>>>>>>>>>>> [5] >>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html >>>>>>>>>>>>>> [6] >>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression >>>>>>>>>>>>>> [7] >>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya < >>>>>>>>>>>>>> dananjayamah...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>> I am interesting on contribute to proposal 6: "Predictive >>>>>>>>>>>>>>> analytic with online data for WSO2 Machine Learner" for GSOC2 >>>>>>>>>>>>>>> this time. >>>>>>>>>>>>>>> Since i have been engaging with some similar projects i think >>>>>>>>>>>>>>> it will be a >>>>>>>>>>>>>>> great experience for me. Please let me know what you think and >>>>>>>>>>>>>>> what you >>>>>>>>>>>>>>> suggest. I have been going through your documents.thank you. >>>>>>>>>>>>>>> regards, >>>>>>>>>>>>>>> Mahesh Dananjaya. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Dev mailing list >>>>>>>>>>>>>>> Dev@wso2.org >>>>>>>>>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena >>>>>>>>>>>>>> mahesha...@wso2.com >>>>>>>>>>>>>> +94711228855 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Pruthuvi Maheshakya Wijewardena >>>>>>>>>>> mahesha...@wso2.com >>>>>>>>>>> +94711228855 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Pruthuvi Maheshakya Wijewardena >>>>>>>>> mahesha...@wso2.com >>>>>>>>> +94711228855 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Pruthuvi Maheshakya Wijewardena >>>>>>> mahesha...@wso2.com >>>>>>> +94711228855 >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Pruthuvi Maheshakya Wijewardena >>>>> mahesha...@wso2.com >>>>> +94711228855 >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Pruthuvi Maheshakya Wijewardena >>> mahesha...@wso2.com >>> +94711228855 >>> >>> >>> >> > > > -- > Pruthuvi Maheshakya Wijewardena > mahesha...@wso2.com > +94711228855 > > > > _______________________________________________ > Dev mailing list > Dev@wso2.org > http://wso2.org/cgi-bin/mailman/listinfo/dev > > -- *Supun Sethunga* Software Engineer WSO2, Inc. http://wso2.com/ lean | enterprise | middleware Mobile : +94 716546324
_______________________________________________ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev