Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

Mahesh Dananjaya Tue, 08 Mar 2016 01:53:45 -0800

Hi Maheshakya,
Thank you very much.i am already onto to that.will let you soon.thank you.
BR,
mahesh.


On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena <mahesha...@wso2.com
> wrote:

> Hi Mahesh,
>
> does that Scala API is with your current product or repo?
>
>
> No, we don't have the Scala API included. What we want is to design the
> Java implementations of those algorithms to train with mini-batches of
> streaming data with the help of the aforementioned methods so that we can
> include in as a CEP extension.
>
> As to clarify, please try to write a simple Java program using Spark MLLib
> linear regression and k-means clustering with a sample data set (You can
> find alot of data sets from UCI repo[1]).  You need to break the dataset
> into several pieces and train a model repeatedly with those.
> After each training run, save the model information (such as weights,
> intercepts for regression and cluster centers for clustering - please check
> the arguments of those methods I have mentioned and save the required
> information of the model)
> When training a model we a new piece of data, use those methods to
> initialize and put the save values for the arguments. This way you can
> start from where you stopped in the previous run.
>
> Let us know your observations and feel free to ask if you need to know
> anything more on this.
>
> We'll let you know what needs to be done to include this in CEP.
>
> Best regards.
>
> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi Maheshakya,
>> great.thank you.i already have ML and CEP and working more towards it.
>> does that Scala API is with your current product or repo?.  thank you.
>> BR,
>> Mahesh.
>>
>> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena <
>> mahesha...@wso2.com> wrote:
>>
>>> Hi Mahesh,
>>>
>>> Please find the comments inline.
>>>
>>> does data stream is taken to ML as the event publisher's format through
>>>> event publisher. Or  we can use direct traffic that comes to event
>>>> receiver, or else as streams
>>>>
>>> We intend to use the direct data as even streams.
>>>
>>> 1.) Those data coming from wso2 DAS to ML are coming as streams?
>>>>
>>> No, WSO2 ML doesn't use any even stream. The data stored in tables in
>>> DAS is loaded into ML.
>>>
>>> 2.) Are there any incremental learning algorithms currently active in
>>>> ML?you mentioned that there are and they are with scala API. So there is a
>>>> streaming support with that Scala API. In that API which format the data is
>>>> aquired to ML?
>>>>
>>> No, there are no incremental learning algorithms in ML. The scala API is
>>> about Spark MLLib. MLLib supports streaming k-means and other generalized
>>> linear models (linear regression variants and logistic regression) with
>>> Scala API. What they basically do in those implementations is retraining
>>> the trained models with mini batches when data sequentially arrives. There,
>>> the breaking of streaming data into mini batches is done with the help of
>>> Spark Streaming. But we do not intend to use Spark streaming in our
>>> implementation. What we need to do is implement a similar behavior for
>>> event streams using the Java API.  The Java API has the following methods:
>>>
>>>    - *createModel
>>>    
>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html#createModel%28org.apache.spark.mllib.linalg.Vector,%20double%29>*
>>>    (Vector
>>>    
>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html>
>>>  weights,
>>>    double intercept) - for GLMs
>>>    - *setInitialModel
>>>    
>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html#setInitialModel%28org.apache.spark.mllib.clustering.KMeansModel%29>*
>>>    (KMeansModel
>>>    
>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeansModel.html>
>>>  model)
>>>    - for K means
>>>
>>> With the help of these methods, we can train models again with newly
>>> arriving data, keeping the characteristics learned with the previous data.
>>> When implementing this, we need to pay attention to other parameters of
>>> incremental learning such as data horizon and data obsolescence (indicated
>>> in the project ideas page).
>>> We need to discuss on how to add these with CEP event streams. I have
>>> added Suho into the thread for more clarification.
>>>
>>> Best regards.
>>>
>>>
>>> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya <
>>> dananjayamah...@gmail.com> wrote:
>>>
>>>> Hi maheshakya,
>>>> as we concerned to use WSO2 CEP to handle streaming data and implement
>>>> the machine learning algorithms with Spark MLLib, does data stream is taken
>>>> to ML as the event publisher's format through event publisher. Or  we can
>>>> use direct traffic that comes to event receiver, or else as streams.
>>>> referring to https://docs.wso2.com/display/CEP410/User+Guide
>>>>     1.) Those data coming from wso2 DAS to ML are coming as streams?
>>>>     2.) Are there any incremental learning algorithms currently active
>>>> in ML?you mentioned that there are and they are with scala API. So there is
>>>> a streaming support with that Scala API. In that API which format the data
>>>> is aquired to ML?
>>>>
>>>> thank you.
>>>> BR,
>>>> Mahesh.
>>>>
>>>> On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena <
>>>> mahesha...@wso2.com> wrote:
>>>>
>>>>> Hi Mahesh,
>>>>>
>>>>> We had to modify a the project scope a little to suit best for the
>>>>> requirements. We will update the project idea with those concerns soon and
>>>>> let you know.
>>>>>
>>>>> We do not support streaming data in WSO2 Machine learner at the
>>>>> moment. The new concern is to use WSO2 CEP to handle streaming data and
>>>>> implement the machine learning algorithms with Spark MLLib. You can look 
>>>>> at
>>>>> the streaming k-means and streaming linear regression implementations in
>>>>> MLLib. Currently, the API is only for scala. Our need is to get the Java
>>>>> APIs of k-means and generalized linear models to support incremental
>>>>> learning with streaming data. This has to be done as mini-batch learning
>>>>> since these algorithms operates as stochastic gradient descents so that 
>>>>> any
>>>>> learning with new data can be done on top of the previously learned 
>>>>> models.
>>>>> So please go through the those APIs[1][2][3] and try to get an idea.
>>>>> Also please try to understand how event streams work in WSO2 CEP
>>>>> [4][5].
>>>>>
>>>>> Best regards.
>>>>>
>>>>> [1]
>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html
>>>>> [2]
>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html
>>>>> [3]
>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html
>>>>> [4] https://docs.wso2.com/display/CEP310/Working+with+Event+Streams
>>>>> [5] https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans
>>>>>
>>>>> On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya <
>>>>> dananjayamah...@gmail.com> wrote:
>>>>>
>>>>>> Hi maheshakya,
>>>>>> give me sometime to go through your ML package. Do current product
>>>>>> have any stream data support?. i did some university projects related to
>>>>>> machine learning with regressions,modelling, factor analysis, cluster
>>>>>> analysis and classification problems (Discriminant Analysis) with SVM
>>>>>> (Support Vector machines), Neural networks, LS classification and
>>>>>> ML(Maximum likelihood). give me sometime to see how wso2 architecture
>>>>>> works.then i can come up with good architecture.thank you.
>>>>>> BR,
>>>>>> Mahesh.
>>>>>>
>>>>>> On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya <
>>>>>> dananjayamah...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Maheshakya,
>>>>>>> Thank you for the resources. I will go through this and looking
>>>>>>> forward to this proposed project.Thank you.
>>>>>>> BR,
>>>>>>> Mahesh.
>>>>>>>
>>>>>>> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya Wijewardena <
>>>>>>> mahesha...@wso2.com> wrote:
>>>>>>>
>>>>>>>> Hi Mahesh,
>>>>>>>>
>>>>>>>> Thank you for the interest for this project.
>>>>>>>>
>>>>>>>> We would like to know what type of similar projects you have worked
>>>>>>>> on. You may have seen that WSO2 Machine Learner supports several 
>>>>>>>> learning
>>>>>>>> algorithms at the moment[1]. This project intends to leverage the 
>>>>>>>> existing
>>>>>>>> algorithms in WSO2 Machine Learner to support streaming data. As an
>>>>>>>> initiative, first you can get an idea about what WSO2 Machine Learner 
>>>>>>>> does
>>>>>>>> and how it operates. You can download WSO2 Machine Learner from product
>>>>>>>> page[2] and the the source code [3]. ML is using Apache Spark MLLib[4] 
>>>>>>>> for
>>>>>>>> its' algorithms so it's better to read and understand what it does as 
>>>>>>>> well.
>>>>>>>>
>>>>>>>> In order to get an idea about the deliverables and the scope of
>>>>>>>> this project, try to understand how Spark streaming[5] (see examples)
>>>>>>>> handles streaming data. Also, have a look in the streaming 
>>>>>>>> algorithms[6][7]
>>>>>>>> supported by MLLib. There are two approaches discussed to employ
>>>>>>>> incremental learning in ML in the project proposals page. These 
>>>>>>>> streaming
>>>>>>>> algorithms can be directly used in the first approach. For the other
>>>>>>>> approach, the your implementation should contain a procedure to create 
>>>>>>>> mini
>>>>>>>> batches from streaming data with relevant sizes (i.e. a moving window) 
>>>>>>>> and
>>>>>>>> do periodic retraining of the same algorithm.
>>>>>>>>
>>>>>>>> To start with the project, you will need to come up with a suitable
>>>>>>>> plan and an architecture first.
>>>>>>>>
>>>>>>>> Please watch the video referenced in the proposal (reference: 5).
>>>>>>>> It will help you getting a better idea about machine learning 
>>>>>>>> algorithms
>>>>>>>> with streaming data.
>>>>>>>>
>>>>>>>> Let us know if you need any help with these.
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>> [1] https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms
>>>>>>>> [2] http://wso2.com/products/machine-learner/
>>>>>>>> [3]
>>>>>>>> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout
>>>>>>>> [4] https://spark.apache.org/docs/1.4.1/mllib-guide.html
>>>>>>>> [5]
>>>>>>>> https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html
>>>>>>>> [6]
>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression
>>>>>>>> [7]
>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means
>>>>>>>>
>>>>>>>> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya <
>>>>>>>> dananjayamah...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>> I am interesting on contribute to proposal 6: "Predictive analytic
>>>>>>>>> with online data for WSO2 Machine Learner" for GSOC2 this time. Since 
>>>>>>>>> i
>>>>>>>>> have been engaging with some similar projects i think it will be a 
>>>>>>>>> great
>>>>>>>>> experience for me. Please let me know what you think and what you 
>>>>>>>>> suggest.
>>>>>>>>> I have been going through your documents.thank you.
>>>>>>>>> regards,
>>>>>>>>> Mahesh Dananjaya.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Dev mailing list
>>>>>>>>> Dev@wso2.org
>>>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>> mahesha...@wso2.com
>>>>>>>> +94711228855
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pruthuvi Maheshakya Wijewardena
>>>>> mahesha...@wso2.com
>>>>> +94711228855
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Pruthuvi Maheshakya Wijewardena
>>> mahesha...@wso2.com
>>> +94711228855
>>>
>>>
>>>
>>
>
>
> --
> Pruthuvi Maheshakya Wijewardena
> mahesha...@wso2.com
> +94711228855
>
>
>

_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

Reply via email to