Hi Mahesh,

Please submit your final proposal to GSoC, before the deadline.

Regards,
Supun

On Mon, Mar 21, 2016 at 1:00 PM, Maheshakya Wijewardena <mahesha...@wso2.com
> wrote:

> Hi Mahesh,
>
> The deadline for submitting your proposals is on March 25th, 2016,
> therefore please start writing the proposal and get feedback.
>
> Best regards.
>
> On Tue, Mar 15, 2016 at 4:14 PM, Mahesh Dananjaya <
> dananjayamah...@gmail.com> wrote:
>
>> Hi Maheshakaya,
>> Ok.I have been trying some examples and try to split them and train
>> incrementally. Still doing that. i have been adding them to my github repo
>> too. https://github.com/dananjayamahesh/GSOC2016 . i saw that there is
>> only scala API support for those streaming algorithms in Spark. so my task
>> is to develop Java API. will let you nkow my progress.thank you very much.
>> BR,
>> Mahesh
>>
>> On Tue, Mar 15, 2016 at 3:21 PM, Maheshakya Wijewardena <
>> mahesha...@wso2.com> wrote:
>>
>>> Hi Mahesh,
>>>
>>> No you don't need to use Hadoop at any stage in this project. Everything
>>> you need is in Spark (regarding ML algorithms).
>>> You can also use Spark MLLibs methods to randomly split datasets.
>>>
>>> Best regards.
>>>
>>> On Mon, Mar 14, 2016 at 1:28 PM, Mahesh Dananjaya <
>>> dananjayamah...@gmail.com> wrote:
>>>
>>>> Hi Maheshakya,
>>>> I am writing some java programs and try to break the dataset into
>>>> several pieces and train a model repeatedly with those data sets using
>>>> Spark MLLib. Do i have to do anything with Hadoop at this stage, because i
>>>> am working with a standalone mode.thank you.
>>>> BR,
>>>> Mahesh.
>>>>
>>>> On Sun, Mar 13, 2016 at 6:30 PM, Maheshakya Wijewardena <
>>>> mahesha...@wso2.com> wrote:
>>>>
>>>>> Hi Mahesh,
>>>>>
>>>>> You don't have to look into carbon-ml.
>>>>>
>>>>> Best regards.
>>>>>
>>>>> On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya <
>>>>> dananjayamah...@gmail.com> wrote:
>>>>>
>>>>>> Hi maheshakya,
>>>>>> i am working on some examples related to Spark and ML.is there
>>>>>> anything to do with carbon-ml. I think i dont need to look into that 
>>>>>> one.do
>>>>>> i?
>>>>>> BR,
>>>>>> Mahesh
>>>>>>
>>>>>> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena <
>>>>>> mahesha...@wso2.com> wrote:
>>>>>>
>>>>>>> Hi Mahesh,
>>>>>>>
>>>>>>> does that Scala API is with your current product or repo?
>>>>>>>
>>>>>>>
>>>>>>> No, we don't have the Scala API included. What we want is to design
>>>>>>> the Java implementations of those algorithms to train with mini-batches 
>>>>>>> of
>>>>>>> streaming data with the help of the aforementioned methods so that we 
>>>>>>> can
>>>>>>> include in as a CEP extension.
>>>>>>>
>>>>>>> As to clarify, please try to write a simple Java program using Spark
>>>>>>> MLLib linear regression and k-means clustering with a sample data set 
>>>>>>> (You
>>>>>>> can find alot of data sets from UCI repo[1]).  You need to break the
>>>>>>> dataset into several pieces and train a model repeatedly with those.
>>>>>>> After each training run, save the model information (such as
>>>>>>> weights, intercepts for regression and cluster centers for clustering -
>>>>>>> please check the arguments of those methods I have mentioned and save 
>>>>>>> the
>>>>>>> required information of the model)
>>>>>>> When training a model we a new piece of data, use those methods to
>>>>>>> initialize and put the save values for the arguments. This way you can
>>>>>>> start from where you stopped in the previous run.
>>>>>>>
>>>>>>> Let us know your observations and feel free to ask if you need to
>>>>>>> know anything more on this.
>>>>>>>
>>>>>>> We'll let you know what needs to be done to include this in CEP.
>>>>>>>
>>>>>>> Best regards.
>>>>>>>
>>>>>>> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya <
>>>>>>> dananjayamah...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Maheshakya,
>>>>>>>> great.thank you.i already have ML and CEP and working more towards
>>>>>>>> it. does that Scala API is with your current product or repo?.  thank 
>>>>>>>> you.
>>>>>>>> BR,
>>>>>>>> Mahesh.
>>>>>>>>
>>>>>>>> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena <
>>>>>>>> mahesha...@wso2.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Mahesh,
>>>>>>>>>
>>>>>>>>> Please find the comments inline.
>>>>>>>>>
>>>>>>>>> does data stream is taken to ML as the event publisher's format
>>>>>>>>>> through event publisher. Or  we can use direct traffic that comes to 
>>>>>>>>>> event
>>>>>>>>>> receiver, or else as streams
>>>>>>>>>>
>>>>>>>>> We intend to use the direct data as even streams.
>>>>>>>>>
>>>>>>>>> 1.) Those data coming from wso2 DAS to ML are coming as streams?
>>>>>>>>>>
>>>>>>>>> No, WSO2 ML doesn't use any even stream. The data stored in tables
>>>>>>>>> in DAS is loaded into ML.
>>>>>>>>>
>>>>>>>>> 2.) Are there any incremental learning algorithms currently active
>>>>>>>>>> in ML?you mentioned that there are and they are with scala API. So 
>>>>>>>>>> there is
>>>>>>>>>> a streaming support with that Scala API. In that API which format 
>>>>>>>>>> the data
>>>>>>>>>> is aquired to ML?
>>>>>>>>>>
>>>>>>>>> No, there are no incremental learning algorithms in ML. The scala
>>>>>>>>> API is about Spark MLLib. MLLib supports streaming k-means and other
>>>>>>>>> generalized linear models (linear regression variants and logistic
>>>>>>>>> regression) with Scala API. What they basically do in those 
>>>>>>>>> implementations
>>>>>>>>> is retraining the trained models with mini batches when data 
>>>>>>>>> sequentially
>>>>>>>>> arrives. There, the breaking of streaming data into mini batches is 
>>>>>>>>> done
>>>>>>>>> with the help of Spark Streaming. But we do not intend to use Spark
>>>>>>>>> streaming in our implementation. What we need to do is implement a 
>>>>>>>>> similar
>>>>>>>>> behavior for event streams using the Java API.  The Java API has the
>>>>>>>>> following methods:
>>>>>>>>>
>>>>>>>>>    - *createModel
>>>>>>>>>    
>>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html#createModel%28org.apache.spark.mllib.linalg.Vector,%20double%29>*
>>>>>>>>>    (Vector
>>>>>>>>>    
>>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html>
>>>>>>>>>  weights,
>>>>>>>>>    double intercept) - for GLMs
>>>>>>>>>    - *setInitialModel
>>>>>>>>>    
>>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html#setInitialModel%28org.apache.spark.mllib.clustering.KMeansModel%29>*
>>>>>>>>>    (KMeansModel
>>>>>>>>>    
>>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeansModel.html>
>>>>>>>>>  model)
>>>>>>>>>    - for K means
>>>>>>>>>
>>>>>>>>> With the help of these methods, we can train models again with
>>>>>>>>> newly arriving data, keeping the characteristics learned with the 
>>>>>>>>> previous
>>>>>>>>> data. When implementing this, we need to pay attention to other 
>>>>>>>>> parameters
>>>>>>>>> of incremental learning such as data horizon and data obsolescence
>>>>>>>>> (indicated in the project ideas page).
>>>>>>>>> We need to discuss on how to add these with CEP event streams. I
>>>>>>>>> have added Suho into the thread for more clarification.
>>>>>>>>>
>>>>>>>>> Best regards.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya <
>>>>>>>>> dananjayamah...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi maheshakya,
>>>>>>>>>> as we concerned to use WSO2 CEP to handle streaming data and
>>>>>>>>>> implement the machine learning algorithms with Spark MLLib, does data
>>>>>>>>>> stream is taken to ML as the event publisher's format through event
>>>>>>>>>> publisher. Or  we can use direct traffic that comes to event 
>>>>>>>>>> receiver, or
>>>>>>>>>> else as streams. referring to
>>>>>>>>>> https://docs.wso2.com/display/CEP410/User+Guide
>>>>>>>>>>     1.) Those data coming from wso2 DAS to ML are coming as
>>>>>>>>>> streams?
>>>>>>>>>>     2.) Are there any incremental learning algorithms currently
>>>>>>>>>> active in ML?you mentioned that there are and they are with scala 
>>>>>>>>>> API. So
>>>>>>>>>> there is a streaming support with that Scala API. In that API which 
>>>>>>>>>> format
>>>>>>>>>> the data is aquired to ML?
>>>>>>>>>>
>>>>>>>>>> thank you.
>>>>>>>>>> BR,
>>>>>>>>>> Mahesh.
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena <
>>>>>>>>>> mahesha...@wso2.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>
>>>>>>>>>>> We had to modify a the project scope a little to suit best for
>>>>>>>>>>> the requirements. We will update the project idea with those 
>>>>>>>>>>> concerns soon
>>>>>>>>>>> and let you know.
>>>>>>>>>>>
>>>>>>>>>>> We do not support streaming data in WSO2 Machine learner at the
>>>>>>>>>>> moment. The new concern is to use WSO2 CEP to handle streaming data 
>>>>>>>>>>> and
>>>>>>>>>>> implement the machine learning algorithms with Spark MLLib. You can 
>>>>>>>>>>> look at
>>>>>>>>>>> the streaming k-means and streaming linear regression 
>>>>>>>>>>> implementations in
>>>>>>>>>>> MLLib. Currently, the API is only for scala. Our need is to get the 
>>>>>>>>>>> Java
>>>>>>>>>>> APIs of k-means and generalized linear models to support incremental
>>>>>>>>>>> learning with streaming data. This has to be done as mini-batch 
>>>>>>>>>>> learning
>>>>>>>>>>> since these algorithms operates as stochastic gradient descents so 
>>>>>>>>>>> that any
>>>>>>>>>>> learning with new data can be done on top of the previously learned 
>>>>>>>>>>> models.
>>>>>>>>>>> So please go through the those APIs[1][2][3] and try to get an idea.
>>>>>>>>>>> Also please try to understand how event streams work in WSO2 CEP
>>>>>>>>>>> [4][5].
>>>>>>>>>>>
>>>>>>>>>>> Best regards.
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html
>>>>>>>>>>> [2]
>>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html
>>>>>>>>>>> [3]
>>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html
>>>>>>>>>>> [4]
>>>>>>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Event+Streams
>>>>>>>>>>> [5]
>>>>>>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya <
>>>>>>>>>>> dananjayamah...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi maheshakya,
>>>>>>>>>>>> give me sometime to go through your ML package. Do current
>>>>>>>>>>>> product have any stream data support?. i did some university 
>>>>>>>>>>>> projects
>>>>>>>>>>>> related to machine learning with regressions,modelling, factor 
>>>>>>>>>>>> analysis,
>>>>>>>>>>>> cluster analysis and classification problems (Discriminant 
>>>>>>>>>>>> Analysis) with
>>>>>>>>>>>> SVM (Support Vector machines), Neural networks, LS classification 
>>>>>>>>>>>> and
>>>>>>>>>>>> ML(Maximum likelihood). give me sometime to see how wso2 
>>>>>>>>>>>> architecture
>>>>>>>>>>>> works.then i can come up with good architecture.thank you.
>>>>>>>>>>>> BR,
>>>>>>>>>>>> Mahesh.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya <
>>>>>>>>>>>> dananjayamah...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Maheshakya,
>>>>>>>>>>>>> Thank you for the resources. I will go through this and
>>>>>>>>>>>>> looking forward to this proposed project.Thank you.
>>>>>>>>>>>>> BR,
>>>>>>>>>>>>> Mahesh.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya Wijewardena <
>>>>>>>>>>>>> mahesha...@wso2.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you for the interest for this project.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We would like to know what type of similar projects you have
>>>>>>>>>>>>>> worked on. You may have seen that WSO2 Machine Learner supports 
>>>>>>>>>>>>>> several
>>>>>>>>>>>>>> learning algorithms at the moment[1]. This project intends to 
>>>>>>>>>>>>>> leverage the
>>>>>>>>>>>>>> existing algorithms in WSO2 Machine Learner to support streaming 
>>>>>>>>>>>>>> data. As
>>>>>>>>>>>>>> an initiative, first you can get an idea about what WSO2 Machine 
>>>>>>>>>>>>>> Learner
>>>>>>>>>>>>>> does and how it operates. You can download WSO2 Machine Learner 
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> product page[2] and the the source code [3]. ML is using Apache 
>>>>>>>>>>>>>> Spark
>>>>>>>>>>>>>> MLLib[4] for its' algorithms so it's better to read and 
>>>>>>>>>>>>>> understand what it
>>>>>>>>>>>>>> does as well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In order to get an idea about the deliverables and the scope
>>>>>>>>>>>>>> of this project, try to understand how Spark streaming[5] (see 
>>>>>>>>>>>>>> examples)
>>>>>>>>>>>>>> handles streaming data. Also, have a look in the streaming 
>>>>>>>>>>>>>> algorithms[6][7]
>>>>>>>>>>>>>> supported by MLLib. There are two approaches discussed to employ
>>>>>>>>>>>>>> incremental learning in ML in the project proposals page. These 
>>>>>>>>>>>>>> streaming
>>>>>>>>>>>>>> algorithms can be directly used in the first approach. For the 
>>>>>>>>>>>>>> other
>>>>>>>>>>>>>> approach, the your implementation should contain a procedure to 
>>>>>>>>>>>>>> create mini
>>>>>>>>>>>>>> batches from streaming data with relevant sizes (i.e. a moving 
>>>>>>>>>>>>>> window) and
>>>>>>>>>>>>>> do periodic retraining of the same algorithm.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To start with the project, you will need to come up with a
>>>>>>>>>>>>>> suitable plan and an architecture first.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please watch the video referenced in the proposal (reference:
>>>>>>>>>>>>>> 5). It will help you getting a better idea about machine learning
>>>>>>>>>>>>>> algorithms with streaming data.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let us know if you need any help with these.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms
>>>>>>>>>>>>>> [2] http://wso2.com/products/machine-learner/
>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout
>>>>>>>>>>>>>> [4] https://spark.apache.org/docs/1.4.1/mllib-guide.html
>>>>>>>>>>>>>> [5]
>>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html
>>>>>>>>>>>>>> [6]
>>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression
>>>>>>>>>>>>>> [7]
>>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya <
>>>>>>>>>>>>>> dananjayamah...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>> I am interesting on contribute to proposal 6: "Predictive
>>>>>>>>>>>>>>> analytic with online data for WSO2 Machine Learner" for GSOC2 
>>>>>>>>>>>>>>> this time.
>>>>>>>>>>>>>>> Since i have been engaging with some similar projects i think 
>>>>>>>>>>>>>>> it will be a
>>>>>>>>>>>>>>> great experience for me. Please let me know what you think and 
>>>>>>>>>>>>>>> what you
>>>>>>>>>>>>>>> suggest. I have been going through your documents.thank you.
>>>>>>>>>>>>>>> regards,
>>>>>>>>>>>>>>> Mahesh Dananjaya.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> Dev mailing list
>>>>>>>>>>>>>>> Dev@wso2.org
>>>>>>>>>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>>>> mahesha...@wso2.com
>>>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>> mahesha...@wso2.com
>>>>>>>>>>> +94711228855
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>> mahesha...@wso2.com
>>>>>>>>> +94711228855
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>> mahesha...@wso2.com
>>>>>>> +94711228855
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pruthuvi Maheshakya Wijewardena
>>>>> mahesha...@wso2.com
>>>>> +94711228855
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Pruthuvi Maheshakya Wijewardena
>>> mahesha...@wso2.com
>>> +94711228855
>>>
>>>
>>>
>>
>
>
> --
> Pruthuvi Maheshakya Wijewardena
> mahesha...@wso2.com
> +94711228855
>
>
>
> _______________________________________________
> Dev mailing list
> Dev@wso2.org
> http://wso2.org/cgi-bin/mailman/listinfo/dev
>
>


-- 
*Supun Sethunga*
Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to