Hi Mahesh,

The deadline for submitting your proposals is on March 25th, 2016,
therefore please start writing the proposal and get feedback.

Best regards.

On Tue, Mar 15, 2016 at 4:14 PM, Mahesh Dananjaya <dananjayamah...@gmail.com
> wrote:

> Hi Maheshakaya,
> Ok.I have been trying some examples and try to split them and train
> incrementally. Still doing that. i have been adding them to my github repo
> too. https://github.com/dananjayamahesh/GSOC2016 . i saw that there is
> only scala API support for those streaming algorithms in Spark. so my task
> is to develop Java API. will let you nkow my progress.thank you very much.
> BR,
> Mahesh
>
> On Tue, Mar 15, 2016 at 3:21 PM, Maheshakya Wijewardena <
> mahesha...@wso2.com> wrote:
>
>> Hi Mahesh,
>>
>> No you don't need to use Hadoop at any stage in this project. Everything
>> you need is in Spark (regarding ML algorithms).
>> You can also use Spark MLLibs methods to randomly split datasets.
>>
>> Best regards.
>>
>> On Mon, Mar 14, 2016 at 1:28 PM, Mahesh Dananjaya <
>> dananjayamah...@gmail.com> wrote:
>>
>>> Hi Maheshakya,
>>> I am writing some java programs and try to break the dataset into
>>> several pieces and train a model repeatedly with those data sets using
>>> Spark MLLib. Do i have to do anything with Hadoop at this stage, because i
>>> am working with a standalone mode.thank you.
>>> BR,
>>> Mahesh.
>>>
>>> On Sun, Mar 13, 2016 at 6:30 PM, Maheshakya Wijewardena <
>>> mahesha...@wso2.com> wrote:
>>>
>>>> Hi Mahesh,
>>>>
>>>> You don't have to look into carbon-ml.
>>>>
>>>> Best regards.
>>>>
>>>> On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya <
>>>> dananjayamah...@gmail.com> wrote:
>>>>
>>>>> Hi maheshakya,
>>>>> i am working on some examples related to Spark and ML.is there
>>>>> anything to do with carbon-ml. I think i dont need to look into that 
>>>>> one.do
>>>>> i?
>>>>> BR,
>>>>> Mahesh
>>>>>
>>>>> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena <
>>>>> mahesha...@wso2.com> wrote:
>>>>>
>>>>>> Hi Mahesh,
>>>>>>
>>>>>> does that Scala API is with your current product or repo?
>>>>>>
>>>>>>
>>>>>> No, we don't have the Scala API included. What we want is to design
>>>>>> the Java implementations of those algorithms to train with mini-batches 
>>>>>> of
>>>>>> streaming data with the help of the aforementioned methods so that we can
>>>>>> include in as a CEP extension.
>>>>>>
>>>>>> As to clarify, please try to write a simple Java program using Spark
>>>>>> MLLib linear regression and k-means clustering with a sample data set 
>>>>>> (You
>>>>>> can find alot of data sets from UCI repo[1]).  You need to break the
>>>>>> dataset into several pieces and train a model repeatedly with those.
>>>>>> After each training run, save the model information (such as weights,
>>>>>> intercepts for regression and cluster centers for clustering - please 
>>>>>> check
>>>>>> the arguments of those methods I have mentioned and save the required
>>>>>> information of the model)
>>>>>> When training a model we a new piece of data, use those methods to
>>>>>> initialize and put the save values for the arguments. This way you can
>>>>>> start from where you stopped in the previous run.
>>>>>>
>>>>>> Let us know your observations and feel free to ask if you need to
>>>>>> know anything more on this.
>>>>>>
>>>>>> We'll let you know what needs to be done to include this in CEP.
>>>>>>
>>>>>> Best regards.
>>>>>>
>>>>>> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya <
>>>>>> dananjayamah...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Maheshakya,
>>>>>>> great.thank you.i already have ML and CEP and working more towards
>>>>>>> it. does that Scala API is with your current product or repo?.  thank 
>>>>>>> you.
>>>>>>> BR,
>>>>>>> Mahesh.
>>>>>>>
>>>>>>> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena <
>>>>>>> mahesha...@wso2.com> wrote:
>>>>>>>
>>>>>>>> Hi Mahesh,
>>>>>>>>
>>>>>>>> Please find the comments inline.
>>>>>>>>
>>>>>>>> does data stream is taken to ML as the event publisher's format
>>>>>>>>> through event publisher. Or  we can use direct traffic that comes to 
>>>>>>>>> event
>>>>>>>>> receiver, or else as streams
>>>>>>>>>
>>>>>>>> We intend to use the direct data as even streams.
>>>>>>>>
>>>>>>>> 1.) Those data coming from wso2 DAS to ML are coming as streams?
>>>>>>>>>
>>>>>>>> No, WSO2 ML doesn't use any even stream. The data stored in tables
>>>>>>>> in DAS is loaded into ML.
>>>>>>>>
>>>>>>>> 2.) Are there any incremental learning algorithms currently active
>>>>>>>>> in ML?you mentioned that there are and they are with scala API. So 
>>>>>>>>> there is
>>>>>>>>> a streaming support with that Scala API. In that API which format the 
>>>>>>>>> data
>>>>>>>>> is aquired to ML?
>>>>>>>>>
>>>>>>>> No, there are no incremental learning algorithms in ML. The scala
>>>>>>>> API is about Spark MLLib. MLLib supports streaming k-means and other
>>>>>>>> generalized linear models (linear regression variants and logistic
>>>>>>>> regression) with Scala API. What they basically do in those 
>>>>>>>> implementations
>>>>>>>> is retraining the trained models with mini batches when data 
>>>>>>>> sequentially
>>>>>>>> arrives. There, the breaking of streaming data into mini batches is 
>>>>>>>> done
>>>>>>>> with the help of Spark Streaming. But we do not intend to use Spark
>>>>>>>> streaming in our implementation. What we need to do is implement a 
>>>>>>>> similar
>>>>>>>> behavior for event streams using the Java API.  The Java API has the
>>>>>>>> following methods:
>>>>>>>>
>>>>>>>>    - *createModel
>>>>>>>>    
>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html#createModel%28org.apache.spark.mllib.linalg.Vector,%20double%29>*
>>>>>>>>    (Vector
>>>>>>>>    
>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html>
>>>>>>>>  weights,
>>>>>>>>    double intercept) - for GLMs
>>>>>>>>    - *setInitialModel
>>>>>>>>    
>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html#setInitialModel%28org.apache.spark.mllib.clustering.KMeansModel%29>*
>>>>>>>>    (KMeansModel
>>>>>>>>    
>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeansModel.html>
>>>>>>>>  model)
>>>>>>>>    - for K means
>>>>>>>>
>>>>>>>> With the help of these methods, we can train models again with
>>>>>>>> newly arriving data, keeping the characteristics learned with the 
>>>>>>>> previous
>>>>>>>> data. When implementing this, we need to pay attention to other 
>>>>>>>> parameters
>>>>>>>> of incremental learning such as data horizon and data obsolescence
>>>>>>>> (indicated in the project ideas page).
>>>>>>>> We need to discuss on how to add these with CEP event streams. I
>>>>>>>> have added Suho into the thread for more clarification.
>>>>>>>>
>>>>>>>> Best regards.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya <
>>>>>>>> dananjayamah...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi maheshakya,
>>>>>>>>> as we concerned to use WSO2 CEP to handle streaming data and
>>>>>>>>> implement the machine learning algorithms with Spark MLLib, does data
>>>>>>>>> stream is taken to ML as the event publisher's format through event
>>>>>>>>> publisher. Or  we can use direct traffic that comes to event 
>>>>>>>>> receiver, or
>>>>>>>>> else as streams. referring to
>>>>>>>>> https://docs.wso2.com/display/CEP410/User+Guide
>>>>>>>>>     1.) Those data coming from wso2 DAS to ML are coming as
>>>>>>>>> streams?
>>>>>>>>>     2.) Are there any incremental learning algorithms currently
>>>>>>>>> active in ML?you mentioned that there are and they are with scala 
>>>>>>>>> API. So
>>>>>>>>> there is a streaming support with that Scala API. In that API which 
>>>>>>>>> format
>>>>>>>>> the data is aquired to ML?
>>>>>>>>>
>>>>>>>>> thank you.
>>>>>>>>> BR,
>>>>>>>>> Mahesh.
>>>>>>>>>
>>>>>>>>> On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena <
>>>>>>>>> mahesha...@wso2.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>
>>>>>>>>>> We had to modify a the project scope a little to suit best for
>>>>>>>>>> the requirements. We will update the project idea with those 
>>>>>>>>>> concerns soon
>>>>>>>>>> and let you know.
>>>>>>>>>>
>>>>>>>>>> We do not support streaming data in WSO2 Machine learner at the
>>>>>>>>>> moment. The new concern is to use WSO2 CEP to handle streaming data 
>>>>>>>>>> and
>>>>>>>>>> implement the machine learning algorithms with Spark MLLib. You can 
>>>>>>>>>> look at
>>>>>>>>>> the streaming k-means and streaming linear regression 
>>>>>>>>>> implementations in
>>>>>>>>>> MLLib. Currently, the API is only for scala. Our need is to get the 
>>>>>>>>>> Java
>>>>>>>>>> APIs of k-means and generalized linear models to support incremental
>>>>>>>>>> learning with streaming data. This has to be done as mini-batch 
>>>>>>>>>> learning
>>>>>>>>>> since these algorithms operates as stochastic gradient descents so 
>>>>>>>>>> that any
>>>>>>>>>> learning with new data can be done on top of the previously learned 
>>>>>>>>>> models.
>>>>>>>>>> So please go through the those APIs[1][2][3] and try to get an idea.
>>>>>>>>>> Also please try to understand how event streams work in WSO2 CEP
>>>>>>>>>> [4][5].
>>>>>>>>>>
>>>>>>>>>> Best regards.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html
>>>>>>>>>> [2]
>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html
>>>>>>>>>> [3]
>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html
>>>>>>>>>> [4]
>>>>>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Event+Streams
>>>>>>>>>> [5]
>>>>>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya <
>>>>>>>>>> dananjayamah...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi maheshakya,
>>>>>>>>>>> give me sometime to go through your ML package. Do current
>>>>>>>>>>> product have any stream data support?. i did some university 
>>>>>>>>>>> projects
>>>>>>>>>>> related to machine learning with regressions,modelling, factor 
>>>>>>>>>>> analysis,
>>>>>>>>>>> cluster analysis and classification problems (Discriminant 
>>>>>>>>>>> Analysis) with
>>>>>>>>>>> SVM (Support Vector machines), Neural networks, LS classification 
>>>>>>>>>>> and
>>>>>>>>>>> ML(Maximum likelihood). give me sometime to see how wso2 
>>>>>>>>>>> architecture
>>>>>>>>>>> works.then i can come up with good architecture.thank you.
>>>>>>>>>>> BR,
>>>>>>>>>>> Mahesh.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya <
>>>>>>>>>>> dananjayamah...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Maheshakya,
>>>>>>>>>>>> Thank you for the resources. I will go through this and looking
>>>>>>>>>>>> forward to this proposed project.Thank you.
>>>>>>>>>>>> BR,
>>>>>>>>>>>> Mahesh.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya Wijewardena <
>>>>>>>>>>>> mahesha...@wso2.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you for the interest for this project.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We would like to know what type of similar projects you have
>>>>>>>>>>>>> worked on. You may have seen that WSO2 Machine Learner supports 
>>>>>>>>>>>>> several
>>>>>>>>>>>>> learning algorithms at the moment[1]. This project intends to 
>>>>>>>>>>>>> leverage the
>>>>>>>>>>>>> existing algorithms in WSO2 Machine Learner to support streaming 
>>>>>>>>>>>>> data. As
>>>>>>>>>>>>> an initiative, first you can get an idea about what WSO2 Machine 
>>>>>>>>>>>>> Learner
>>>>>>>>>>>>> does and how it operates. You can download WSO2 Machine Learner 
>>>>>>>>>>>>> from
>>>>>>>>>>>>> product page[2] and the the source code [3]. ML is using Apache 
>>>>>>>>>>>>> Spark
>>>>>>>>>>>>> MLLib[4] for its' algorithms so it's better to read and 
>>>>>>>>>>>>> understand what it
>>>>>>>>>>>>> does as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In order to get an idea about the deliverables and the scope
>>>>>>>>>>>>> of this project, try to understand how Spark streaming[5] (see 
>>>>>>>>>>>>> examples)
>>>>>>>>>>>>> handles streaming data. Also, have a look in the streaming 
>>>>>>>>>>>>> algorithms[6][7]
>>>>>>>>>>>>> supported by MLLib. There are two approaches discussed to employ
>>>>>>>>>>>>> incremental learning in ML in the project proposals page. These 
>>>>>>>>>>>>> streaming
>>>>>>>>>>>>> algorithms can be directly used in the first approach. For the 
>>>>>>>>>>>>> other
>>>>>>>>>>>>> approach, the your implementation should contain a procedure to 
>>>>>>>>>>>>> create mini
>>>>>>>>>>>>> batches from streaming data with relevant sizes (i.e. a moving 
>>>>>>>>>>>>> window) and
>>>>>>>>>>>>> do periodic retraining of the same algorithm.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To start with the project, you will need to come up with a
>>>>>>>>>>>>> suitable plan and an architecture first.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please watch the video referenced in the proposal (reference:
>>>>>>>>>>>>> 5). It will help you getting a better idea about machine learning
>>>>>>>>>>>>> algorithms with streaming data.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let us know if you need any help with these.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms
>>>>>>>>>>>>> [2] http://wso2.com/products/machine-learner/
>>>>>>>>>>>>> [3]
>>>>>>>>>>>>> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout
>>>>>>>>>>>>> [4] https://spark.apache.org/docs/1.4.1/mllib-guide.html
>>>>>>>>>>>>> [5]
>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html
>>>>>>>>>>>>> [6]
>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression
>>>>>>>>>>>>> [7]
>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya <
>>>>>>>>>>>>> dananjayamah...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>> I am interesting on contribute to proposal 6: "Predictive
>>>>>>>>>>>>>> analytic with online data for WSO2 Machine Learner" for GSOC2 
>>>>>>>>>>>>>> this time.
>>>>>>>>>>>>>> Since i have been engaging with some similar projects i think it 
>>>>>>>>>>>>>> will be a
>>>>>>>>>>>>>> great experience for me. Please let me know what you think and 
>>>>>>>>>>>>>> what you
>>>>>>>>>>>>>> suggest. I have been going through your documents.thank you.
>>>>>>>>>>>>>> regards,
>>>>>>>>>>>>>> Mahesh Dananjaya.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Dev mailing list
>>>>>>>>>>>>>> Dev@wso2.org
>>>>>>>>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>>> mahesha...@wso2.com
>>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>> mahesha...@wso2.com
>>>>>>>>>> +94711228855
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>> mahesha...@wso2.com
>>>>>>>> +94711228855
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>> mahesha...@wso2.com
>>>>>> +94711228855
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Pruthuvi Maheshakya Wijewardena
>>>> mahesha...@wso2.com
>>>> +94711228855
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Pruthuvi Maheshakya Wijewardena
>> mahesha...@wso2.com
>> +94711228855
>>
>>
>>
>


-- 
Pruthuvi Maheshakya Wijewardena
mahesha...@wso2.com
+94711228855
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to