Hi Mahesh, The deadline for submitting your proposals is on March 25th, 2016, therefore please start writing the proposal and get feedback.
Best regards. On Tue, Mar 15, 2016 at 4:14 PM, Mahesh Dananjaya <dananjayamah...@gmail.com > wrote: > Hi Maheshakaya, > Ok.I have been trying some examples and try to split them and train > incrementally. Still doing that. i have been adding them to my github repo > too. https://github.com/dananjayamahesh/GSOC2016 . i saw that there is > only scala API support for those streaming algorithms in Spark. so my task > is to develop Java API. will let you nkow my progress.thank you very much. > BR, > Mahesh > > On Tue, Mar 15, 2016 at 3:21 PM, Maheshakya Wijewardena < > mahesha...@wso2.com> wrote: > >> Hi Mahesh, >> >> No you don't need to use Hadoop at any stage in this project. Everything >> you need is in Spark (regarding ML algorithms). >> You can also use Spark MLLibs methods to randomly split datasets. >> >> Best regards. >> >> On Mon, Mar 14, 2016 at 1:28 PM, Mahesh Dananjaya < >> dananjayamah...@gmail.com> wrote: >> >>> Hi Maheshakya, >>> I am writing some java programs and try to break the dataset into >>> several pieces and train a model repeatedly with those data sets using >>> Spark MLLib. Do i have to do anything with Hadoop at this stage, because i >>> am working with a standalone mode.thank you. >>> BR, >>> Mahesh. >>> >>> On Sun, Mar 13, 2016 at 6:30 PM, Maheshakya Wijewardena < >>> mahesha...@wso2.com> wrote: >>> >>>> Hi Mahesh, >>>> >>>> You don't have to look into carbon-ml. >>>> >>>> Best regards. >>>> >>>> On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya < >>>> dananjayamah...@gmail.com> wrote: >>>> >>>>> Hi maheshakya, >>>>> i am working on some examples related to Spark and ML.is there >>>>> anything to do with carbon-ml. I think i dont need to look into that >>>>> one.do >>>>> i? >>>>> BR, >>>>> Mahesh >>>>> >>>>> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena < >>>>> mahesha...@wso2.com> wrote: >>>>> >>>>>> Hi Mahesh, >>>>>> >>>>>> does that Scala API is with your current product or repo? >>>>>> >>>>>> >>>>>> No, we don't have the Scala API included. What we want is to design >>>>>> the Java implementations of those algorithms to train with mini-batches >>>>>> of >>>>>> streaming data with the help of the aforementioned methods so that we can >>>>>> include in as a CEP extension. >>>>>> >>>>>> As to clarify, please try to write a simple Java program using Spark >>>>>> MLLib linear regression and k-means clustering with a sample data set >>>>>> (You >>>>>> can find alot of data sets from UCI repo[1]). You need to break the >>>>>> dataset into several pieces and train a model repeatedly with those. >>>>>> After each training run, save the model information (such as weights, >>>>>> intercepts for regression and cluster centers for clustering - please >>>>>> check >>>>>> the arguments of those methods I have mentioned and save the required >>>>>> information of the model) >>>>>> When training a model we a new piece of data, use those methods to >>>>>> initialize and put the save values for the arguments. This way you can >>>>>> start from where you stopped in the previous run. >>>>>> >>>>>> Let us know your observations and feel free to ask if you need to >>>>>> know anything more on this. >>>>>> >>>>>> We'll let you know what needs to be done to include this in CEP. >>>>>> >>>>>> Best regards. >>>>>> >>>>>> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya < >>>>>> dananjayamah...@gmail.com> wrote: >>>>>> >>>>>>> Hi Maheshakya, >>>>>>> great.thank you.i already have ML and CEP and working more towards >>>>>>> it. does that Scala API is with your current product or repo?. thank >>>>>>> you. >>>>>>> BR, >>>>>>> Mahesh. >>>>>>> >>>>>>> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena < >>>>>>> mahesha...@wso2.com> wrote: >>>>>>> >>>>>>>> Hi Mahesh, >>>>>>>> >>>>>>>> Please find the comments inline. >>>>>>>> >>>>>>>> does data stream is taken to ML as the event publisher's format >>>>>>>>> through event publisher. Or we can use direct traffic that comes to >>>>>>>>> event >>>>>>>>> receiver, or else as streams >>>>>>>>> >>>>>>>> We intend to use the direct data as even streams. >>>>>>>> >>>>>>>> 1.) Those data coming from wso2 DAS to ML are coming as streams? >>>>>>>>> >>>>>>>> No, WSO2 ML doesn't use any even stream. The data stored in tables >>>>>>>> in DAS is loaded into ML. >>>>>>>> >>>>>>>> 2.) Are there any incremental learning algorithms currently active >>>>>>>>> in ML?you mentioned that there are and they are with scala API. So >>>>>>>>> there is >>>>>>>>> a streaming support with that Scala API. In that API which format the >>>>>>>>> data >>>>>>>>> is aquired to ML? >>>>>>>>> >>>>>>>> No, there are no incremental learning algorithms in ML. The scala >>>>>>>> API is about Spark MLLib. MLLib supports streaming k-means and other >>>>>>>> generalized linear models (linear regression variants and logistic >>>>>>>> regression) with Scala API. What they basically do in those >>>>>>>> implementations >>>>>>>> is retraining the trained models with mini batches when data >>>>>>>> sequentially >>>>>>>> arrives. There, the breaking of streaming data into mini batches is >>>>>>>> done >>>>>>>> with the help of Spark Streaming. But we do not intend to use Spark >>>>>>>> streaming in our implementation. What we need to do is implement a >>>>>>>> similar >>>>>>>> behavior for event streams using the Java API. The Java API has the >>>>>>>> following methods: >>>>>>>> >>>>>>>> - *createModel >>>>>>>> >>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html#createModel%28org.apache.spark.mllib.linalg.Vector,%20double%29>* >>>>>>>> (Vector >>>>>>>> >>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html> >>>>>>>> weights, >>>>>>>> double intercept) - for GLMs >>>>>>>> - *setInitialModel >>>>>>>> >>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html#setInitialModel%28org.apache.spark.mllib.clustering.KMeansModel%29>* >>>>>>>> (KMeansModel >>>>>>>> >>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeansModel.html> >>>>>>>> model) >>>>>>>> - for K means >>>>>>>> >>>>>>>> With the help of these methods, we can train models again with >>>>>>>> newly arriving data, keeping the characteristics learned with the >>>>>>>> previous >>>>>>>> data. When implementing this, we need to pay attention to other >>>>>>>> parameters >>>>>>>> of incremental learning such as data horizon and data obsolescence >>>>>>>> (indicated in the project ideas page). >>>>>>>> We need to discuss on how to add these with CEP event streams. I >>>>>>>> have added Suho into the thread for more clarification. >>>>>>>> >>>>>>>> Best regards. >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya < >>>>>>>> dananjayamah...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi maheshakya, >>>>>>>>> as we concerned to use WSO2 CEP to handle streaming data and >>>>>>>>> implement the machine learning algorithms with Spark MLLib, does data >>>>>>>>> stream is taken to ML as the event publisher's format through event >>>>>>>>> publisher. Or we can use direct traffic that comes to event >>>>>>>>> receiver, or >>>>>>>>> else as streams. referring to >>>>>>>>> https://docs.wso2.com/display/CEP410/User+Guide >>>>>>>>> 1.) Those data coming from wso2 DAS to ML are coming as >>>>>>>>> streams? >>>>>>>>> 2.) Are there any incremental learning algorithms currently >>>>>>>>> active in ML?you mentioned that there are and they are with scala >>>>>>>>> API. So >>>>>>>>> there is a streaming support with that Scala API. In that API which >>>>>>>>> format >>>>>>>>> the data is aquired to ML? >>>>>>>>> >>>>>>>>> thank you. >>>>>>>>> BR, >>>>>>>>> Mahesh. >>>>>>>>> >>>>>>>>> On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena < >>>>>>>>> mahesha...@wso2.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Mahesh, >>>>>>>>>> >>>>>>>>>> We had to modify a the project scope a little to suit best for >>>>>>>>>> the requirements. We will update the project idea with those >>>>>>>>>> concerns soon >>>>>>>>>> and let you know. >>>>>>>>>> >>>>>>>>>> We do not support streaming data in WSO2 Machine learner at the >>>>>>>>>> moment. The new concern is to use WSO2 CEP to handle streaming data >>>>>>>>>> and >>>>>>>>>> implement the machine learning algorithms with Spark MLLib. You can >>>>>>>>>> look at >>>>>>>>>> the streaming k-means and streaming linear regression >>>>>>>>>> implementations in >>>>>>>>>> MLLib. Currently, the API is only for scala. Our need is to get the >>>>>>>>>> Java >>>>>>>>>> APIs of k-means and generalized linear models to support incremental >>>>>>>>>> learning with streaming data. This has to be done as mini-batch >>>>>>>>>> learning >>>>>>>>>> since these algorithms operates as stochastic gradient descents so >>>>>>>>>> that any >>>>>>>>>> learning with new data can be done on top of the previously learned >>>>>>>>>> models. >>>>>>>>>> So please go through the those APIs[1][2][3] and try to get an idea. >>>>>>>>>> Also please try to understand how event streams work in WSO2 CEP >>>>>>>>>> [4][5]. >>>>>>>>>> >>>>>>>>>> Best regards. >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html >>>>>>>>>> [2] >>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html >>>>>>>>>> [3] >>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html >>>>>>>>>> [4] >>>>>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Event+Streams >>>>>>>>>> [5] >>>>>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans >>>>>>>>>> >>>>>>>>>> On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya < >>>>>>>>>> dananjayamah...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi maheshakya, >>>>>>>>>>> give me sometime to go through your ML package. Do current >>>>>>>>>>> product have any stream data support?. i did some university >>>>>>>>>>> projects >>>>>>>>>>> related to machine learning with regressions,modelling, factor >>>>>>>>>>> analysis, >>>>>>>>>>> cluster analysis and classification problems (Discriminant >>>>>>>>>>> Analysis) with >>>>>>>>>>> SVM (Support Vector machines), Neural networks, LS classification >>>>>>>>>>> and >>>>>>>>>>> ML(Maximum likelihood). give me sometime to see how wso2 >>>>>>>>>>> architecture >>>>>>>>>>> works.then i can come up with good architecture.thank you. >>>>>>>>>>> BR, >>>>>>>>>>> Mahesh. >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya < >>>>>>>>>>> dananjayamah...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Maheshakya, >>>>>>>>>>>> Thank you for the resources. I will go through this and looking >>>>>>>>>>>> forward to this proposed project.Thank you. >>>>>>>>>>>> BR, >>>>>>>>>>>> Mahesh. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya Wijewardena < >>>>>>>>>>>> mahesha...@wso2.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Mahesh, >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you for the interest for this project. >>>>>>>>>>>>> >>>>>>>>>>>>> We would like to know what type of similar projects you have >>>>>>>>>>>>> worked on. You may have seen that WSO2 Machine Learner supports >>>>>>>>>>>>> several >>>>>>>>>>>>> learning algorithms at the moment[1]. This project intends to >>>>>>>>>>>>> leverage the >>>>>>>>>>>>> existing algorithms in WSO2 Machine Learner to support streaming >>>>>>>>>>>>> data. As >>>>>>>>>>>>> an initiative, first you can get an idea about what WSO2 Machine >>>>>>>>>>>>> Learner >>>>>>>>>>>>> does and how it operates. You can download WSO2 Machine Learner >>>>>>>>>>>>> from >>>>>>>>>>>>> product page[2] and the the source code [3]. ML is using Apache >>>>>>>>>>>>> Spark >>>>>>>>>>>>> MLLib[4] for its' algorithms so it's better to read and >>>>>>>>>>>>> understand what it >>>>>>>>>>>>> does as well. >>>>>>>>>>>>> >>>>>>>>>>>>> In order to get an idea about the deliverables and the scope >>>>>>>>>>>>> of this project, try to understand how Spark streaming[5] (see >>>>>>>>>>>>> examples) >>>>>>>>>>>>> handles streaming data. Also, have a look in the streaming >>>>>>>>>>>>> algorithms[6][7] >>>>>>>>>>>>> supported by MLLib. There are two approaches discussed to employ >>>>>>>>>>>>> incremental learning in ML in the project proposals page. These >>>>>>>>>>>>> streaming >>>>>>>>>>>>> algorithms can be directly used in the first approach. For the >>>>>>>>>>>>> other >>>>>>>>>>>>> approach, the your implementation should contain a procedure to >>>>>>>>>>>>> create mini >>>>>>>>>>>>> batches from streaming data with relevant sizes (i.e. a moving >>>>>>>>>>>>> window) and >>>>>>>>>>>>> do periodic retraining of the same algorithm. >>>>>>>>>>>>> >>>>>>>>>>>>> To start with the project, you will need to come up with a >>>>>>>>>>>>> suitable plan and an architecture first. >>>>>>>>>>>>> >>>>>>>>>>>>> Please watch the video referenced in the proposal (reference: >>>>>>>>>>>>> 5). It will help you getting a better idea about machine learning >>>>>>>>>>>>> algorithms with streaming data. >>>>>>>>>>>>> >>>>>>>>>>>>> Let us know if you need any help with these. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms >>>>>>>>>>>>> [2] http://wso2.com/products/machine-learner/ >>>>>>>>>>>>> [3] >>>>>>>>>>>>> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout >>>>>>>>>>>>> [4] https://spark.apache.org/docs/1.4.1/mllib-guide.html >>>>>>>>>>>>> [5] >>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html >>>>>>>>>>>>> [6] >>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression >>>>>>>>>>>>> [7] >>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya < >>>>>>>>>>>>> dananjayamah...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> I am interesting on contribute to proposal 6: "Predictive >>>>>>>>>>>>>> analytic with online data for WSO2 Machine Learner" for GSOC2 >>>>>>>>>>>>>> this time. >>>>>>>>>>>>>> Since i have been engaging with some similar projects i think it >>>>>>>>>>>>>> will be a >>>>>>>>>>>>>> great experience for me. Please let me know what you think and >>>>>>>>>>>>>> what you >>>>>>>>>>>>>> suggest. I have been going through your documents.thank you. >>>>>>>>>>>>>> regards, >>>>>>>>>>>>>> Mahesh Dananjaya. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Dev mailing list >>>>>>>>>>>>>> Dev@wso2.org >>>>>>>>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena >>>>>>>>>>>>> mahesha...@wso2.com >>>>>>>>>>>>> +94711228855 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Pruthuvi Maheshakya Wijewardena >>>>>>>>>> mahesha...@wso2.com >>>>>>>>>> +94711228855 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Pruthuvi Maheshakya Wijewardena >>>>>>>> mahesha...@wso2.com >>>>>>>> +94711228855 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Pruthuvi Maheshakya Wijewardena >>>>>> mahesha...@wso2.com >>>>>> +94711228855 >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Pruthuvi Maheshakya Wijewardena >>>> mahesha...@wso2.com >>>> +94711228855 >>>> >>>> >>>> >>> >> >> >> -- >> Pruthuvi Maheshakya Wijewardena >> mahesha...@wso2.com >> +94711228855 >> >> >> > -- Pruthuvi Maheshakya Wijewardena mahesha...@wso2.com +94711228855
_______________________________________________ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev