Hi Misgana,

I committed the code for reading a csv file. My next task will be sampling
> and starting to implement an ensemble method(Stacking).

I went through the code. Would like to suggest a small thing. Most of the
Spark algorithms need JavaRDDs as the input for datasets. Hence reading
your file as a JavaRDD<LabeledPoint> is the better approach than reading it
as a list of labelled points. Please refer [1] and [2] for an example.

-  How to decide which models to use for an ensemble and which parameters?

Type of Model/Algorithm has to be a user input. The parameters will
depend on the algorithm user picks.

- Should the ensemble methods be implemented as a wrapper around the
> base-models?

Yes.  You can use the existing algorithms in WSO2 Machine Learner, as the
base-models. (I have shared that in my previous mail)


> - Which library to use for matrix operations? Is Apache
> commons.math.Linearalgebra ok?

Yes  Apache commons.math.* would be fine. Infact you can use any library
with open-source licence.


What do you think about a hangout session to clarify stuff and get to know
> each other.:)

Of course! Please arrange some time slot (Hope it will be IST time zone:
GMT+5.30 friendly :) ) and send me a calendar invite.


[1]
https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/utils/MLUtils.java#L58
[2]
https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms/SupervisedSparkModelBuilder.java#L87

Regards,
Supun

On Sat, May 7, 2016 at 8:46 PM, Misgana Negassi <negas...@tf.uni-freiburg.de
> wrote:

>
> Hi Supun,
> I committed the code for reading a csv file. My next task will be sampling
> and starting to implement an ensemble method(Stacking).
> I have some questions about:
> -  How to decide which models to use for an ensemble and which parameters?
> - Should the ensemble methods be implemented as a wrapper around the
> base-models?
> - Which library to use for matrix operations? Is Apache
> commons.math.Linearalgebra ok?
>
> What do you think about a hangout session to clarify stuff and get to know
> each other.:)
>
> Have a nice weekend!
> Misgana
>
>
> On 05.05.2016 19:59, Misgana Negassi wrote:
>
> Hi Supun,
> Thank you for the detailed explanation.
>
> I switched to intelliJ IDEA as an IDE with Ubuntu 14.04. The import errors
> are resolved after the project was imported as maven project. This took a
> while because of persistent pom.xml errors -- I am still in the process of
> reading  about maven, Spark, REST and the carbon-Architecture.
> I have created an independent maven-project[1] for the implementation of
> the ensemble-methods. Currently I am writing code for reading a CSV file
> using the Apache library and converting it into Java RDD. I will commit
> once I am done with it.
>
>
>
> Regards,
> Misgana
>
>
> [1]https://github.com/zemoel/ensemble-methods
>
>
> On 04.05.2016 06:33, Supun Sethunga wrote:
>
> Hi Misgana,
>
> Seems you have misunderstood the "carbon" architecture. Let me explain it.
>  *carbon-ml* repo contains the source code of the osgi bundles (i.e: jar
> libraries) which contain actual implementation/logic (such as, the
> implementation of importing datasets, creating projects, building models,
> and etc). Hence, these are similar to any other third-party library, which
> you have to invoke using their APIs. (They don't have main classes, they
> have APIs). This repo also contains a REST API, which exposes the
> above-mentioned APIs as a RESTful service. It also includes the source code
> for the UI, which invokes those REST APIs, behalf of the user. Please refer
> [1] to get the overall idea.
> Whereas, *product-ml* repo contains the source code, which collects the
> necessary libraries (such as the libraries of carbon-ml, REST API, UI) and
> bundles it all together, to create the final binary distribution.
>
> *carbon-ml* already contains the implementations of number of algorithms
> [2]. Your ultimate goal is to add three more such implementations to that
> repo (i.e: the three ensemble methods). In doing so, you don't need to
> re-implement the logics of importing datasets, creating projects and etc..
> As those have already been implemented and you can use those methods from
> their APIs. (Please refer any of the current algorithms to get an idea..).
>
> But, since it can be difficult to implement the ensemble logic and
> integrate it to carbon-ml repo, *at the same time*, We recommend you to *first
> implement your logic in a separate java client.* This has to be an
> independent maven project. The whole purpose of this java client is to
> implement your logic independently and test its functionality and accuracy.
> You can use native spark ml-lib libraries for this. In the java client,
> following steps needed to be done:
>
>    - Read the CSV file.
>    - Do the sampling as needed. (train set and test set)
>    - Train an ensemble model using the train set.
>    - Do prediction on the test set and evaluate the accuracy.
>
> (Do not worry about the project concept at this point)
>
> Once you are satisfied with the results, then you can integrate that logic
> to the carbon-ml repo (to [2]).
>
> Please push whatever the code you write with respect to this java client
> to GIT, and share it with us too.
>
> Importing classes e.g " import 
> org.wso2.carbon.ml.core.interfaces.MLModelBuilder"
>> doesn't resolve.  I tried to solve issue by clean project, or adding the
>> project to the Spark build_path with no success.
>
> Did you get this when you try to import carbon-ml to eclipse? Did you
> import it as a maven project?
>
>
> In another topic:
>> I downloaded the machine learner(binary) following the instructions.
>> After logging in with admin, I didn't get the interface as it is explained
>> in the
>> "Building Your First Predictive Model with WSO2 Machine Learner" tutorial.
>
> Can you please share a screenshot of what you get once you login? Do you
> see any errors in the console? if so can you please share that too?
>
> Hope I have answered all your questions.
>
> [1]  <https://docs.wso2.com/display/ML110/Architecture>
> https://docs.wso2.com/display/ML110/Architecture
> [2]
> <https://github.com/wso2/carbon-ml/tree/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms>
> https://github.com/wso2/carbon-ml/tree/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms
>
> Regards,
> Supun
>
> On Tue, May 3, 2016 at 9:08 PM, Misgana Negassi <
> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote:
>
>> Hi Supun,
>>
>> I did "maven clean install" and downloaded the product-ml source code as
>> well.
>>
>> My understanding of this workflow is that I create a new project for
>> reading csv file and import an ensemble model which was built with
>> Spark(e.g Random Forest) and and also import the
>> SupervisedSparkModelBuilder.(am I right?)
>> I did this steps and currently am trying to solve issue with:
>> - Importing classes e.g " import
>> org.wso2.carbon.ml.core.interfaces.MLModelBuilder" doesn't resolve.  I
>> tried to solve issue by clean project, or adding the project to the Spark
>> build_path with no success.
>> - Should i create my Standalone project as maven project or normal java
>> project? inside core or as an independent project?
>> - I couldn't find a main class where I run and  see the output of models?
>>
>> In another topic:
>> I downloaded the machine learner(binary) following the instructions.
>> After logging in with admin, I didn't get the interface as it is explained
>> in the
>> "Building Your First Predictive Model with WSO2 Machine Learner" tutorial.
>>
>>
>> My apologies for asking so many questions. It has been a while since i
>> worked with Eclipse and Java.
>>
>> Best,
>> Misgana
>>
>>
>>
>>
>> On 03.05.2016 06:28, Supun Sethunga wrote:
>>
>> Hi Misgana,
>>
>> Any update on the progress?
>>
>> Regards,
>>
>> On Thu, Apr 28, 2016 at 7:10 PM, Supun Sethunga < <sup...@wso2.com>
>> sup...@wso2.com> wrote:
>>
>>> Hi Misgana,
>>>
>>> Please find the answers inline.
>>>
>>> 1. Do I need only to work with carbon-ml repo or should the whole kernel
>>>> be installed?
>>>
>>> Don't need to build the kernal. Building carbon-ml [1] and then
>>> product-ml [2] would be enough.
>>>
>>> The Build from source Documentation site varies from what i have
>>>> setup(uses svn, downloads whole kernel). Should i follow this?
>>>
>>> Just download the source-code ([1] and [2]), and execute a "maven clean
>>> install" from the source directory. As I mentioned earlier, no need to
>>> download or build the Carbon Kernal.
>>>
>>> Could  you suggest on how to setup my dev environment? Currently I
>>>> installed Spark, converted my project to a maven project. But maven seems
>>>> not to properly compile.
>>>
>>> No need to install spark. Spark is only used as an external library
>>> (jars). If you are using Eclipse/IntelliJ IDEA, import the source-code as a
>>> maven project. IDE will automatically resolve the dependencies.
>>>
>>>  I implemented Gradientboosted to core/spark/Algorithms.
>>>
>>> What's the purpose of implementing Gradientboosted?
>>>
>>> Next step would be to modify *SupervisedSparkModelBuilder.*
>>>
>>> I think it would be easier for you to first (after finished with the
>>> above steps) write a simple standalone java client, which reads a simple
>>> dataset (a csv file) and build the ensemble model with Spark. Then you can
>>> integrate that logic to the SupervisedSparkModelBuilder, and eventually
>>> to the model-building workflow of WSO2 ML.
>>>
>>> [1]  <https://github.com/wso2/carbon-ml>
>>> https://github.com/wso2/carbon-ml
>>> [2]  <https://github.com/wso2/product-ml>
>>> https://github.com/wso2/product-ml
>>> [3]  <https://docs.wso2.com/display/ML110/Building+from+Source>
>>> https://docs.wso2.com/display/ML110/Building+from+Source
>>>
>>> Regards,
>>> Supun
>>>
>>>
>>> On Thu, Apr 28, 2016 at 5:22 PM, misgana < <misgananega...@gmail.com>
>>> misgananega...@gmail.com> wrote:
>>>
>>>> Hi Supun,
>>>>
>>>> My current workflow looks like this:
>>>> 1. Fork and clone carbon-ml repo form github  -- DONE
>>>> 2. Setup Dev environment -- IN PROGRESS
>>>> 3. Integrate GradientBoosted tree algorithm to carbon-ml  -- IN PROGRESS
>>>>
>>>> Issues:
>>>> 1. Do I need only to work with carbon-ml repo or should the whole
>>>> kernel be installed?
>>>> 2. The Build from source Documentation site varies from what i have
>>>> setup(uses svn, downloads whole kernel). Should i follow this?
>>>> 3.Could  you suggest on how to setup my dev environment? Currently I
>>>> installed Spark, converted my project to a maven project. But maven seems
>>>> not to properly compile.
>>>> 4. I implemented Gradientboosted to core/spark/Algorithms. Next step
>>>> would be to modify
>>>> *SupervisedSparkModelBuilder.  5. *Here I would check how this would
>>>> be integrated in the whole framework and test on Iris dataset.(On this I
>>>> need  to do some reading)
>>>>
>>>> I would very appreciate your guidance on this plan/work in progress.
>>>>
>>>> Best,
>>>> Misgana
>>>>
>>>>
>>>>
>>>> On 26.04.2016 09:26, Misgana Negassi wrote:
>>>>
>>>> Hi Supun,
>>>> I have forked carbon-ml to my repo[1] and currently I am familiarizing
>>>> myself with the code and software architecture. I will make commits after
>>>> trying out with a new algorithm.
>>>>
>>>> [1] <https://github.com/zemoel/carbon-ml>
>>>> https://github.com/zemoel/carbon-ml
>>>>
>>>> On 26.04.2016 06:47, Supun Sethunga wrote:
>>>>
>>>> Hi Misgana,
>>>>
>>>> As you progress, please keep us posted too. It would be nice if you
>>>> can share your code as well (Github project). You can take a fork of repo
>>>> [1], and start working on your fork.
>>>>
>>>> [1]  <https://github.com/wso2/carbon-ml>
>>>> https://github.com/wso2/carbon-ml
>>>>
>>>> On Mon, Apr 25, 2016 at 7:57 PM, Misgana Negassi <
>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote:
>>>>
>>>>> Hi Supun,
>>>>>
>>>>> Thank you for accepting me for this project!I am excited to work on it
>>>>> and start right away with the links you sent.
>>>>>
>>>>> Best,
>>>>> Misgana
>>>>>
>>>>>
>>>>>
>>>>> On 25.04.2016 12:06, Supun Sethunga wrote:
>>>>>
>>>>> Hi Misgana,
>>>>>
>>>>> Congratulations for getting accepted for the gsoc 2016! Hope you are
>>>>> ready to get started with the project.
>>>>>
>>>>> To get more familiarized with the code, I'm sharing the
>>>>> implementations of the current algorithms [1]. For your ensemble method,
>>>>> you need to add three more cases (for the three types of ensembles) for 
>>>>> the
>>>>> method [2]. You may try out adding a new algorithm to he existing flow,
>>>>> and see how it works. Please feel free to raise any questions/issues you
>>>>> come across.
>>>>>
>>>>> [1]
>>>>> https://github.com/wso2/carbon-ml/tree/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms
>>>>> [2]
>>>>> <https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms/SupervisedSparkModelBuilder.java#L101>
>>>>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms/SupervisedSparkModelBuilder.java#L101
>>>>>
>>>>> Regards,
>>>>> Supun
>>>>>
>>>>> On Thu, Mar 24, 2016 at 9:31 PM, Misgana Negassi <
>>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote:
>>>>>
>>>>>> Hi Supun,
>>>>>>
>>>>>> Thank you for your support and advice in this proposal process!
>>>>>>
>>>>>> In the case you are interested, I am attaching my report paper with
>>>>>> contains my work with ensemble methods particularly Stacking.
>>>>>>
>>>>>> Best,
>>>>>> Misgana
>>>>>>
>>>>>>
>>>>>> On 24.03.2016 04:12, Supun Sethunga wrote:
>>>>>>
>>>>>> Looks good! Please go ahead and submit to GSoC.
>>>>>>
>>>>>> Thanks,
>>>>>> Supun
>>>>>>
>>>>>> On Thu, Mar 24, 2016 at 4:02 AM, Misgana Negassi <
>>>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote:
>>>>>>
>>>>>>> Hi Supun,
>>>>>>>
>>>>>>> I have added the changes you recommended. Could you kindly give me a
>>>>>>> feedback?
>>>>>>>
>>>>>>> Best,
>>>>>>> Misgana
>>>>>>>
>>>>>>> On 23.03.2016 15:04, Supun Sethunga wrote:
>>>>>>>
>>>>>>> Hi Misgana,
>>>>>>>
>>>>>>> I went through your proposal. Overall it looks good. Here are a few
>>>>>>> comments I would like to point out:
>>>>>>>
>>>>>>>    - Its better to have some sort of an architecture diagram,
>>>>>>>    explaining your solution in a higher level.
>>>>>>>    - In the timeline, better to break down the "Week 1­3 (May 23 ­
>>>>>>>    June 20, 2016)" into three sub-levels, and allocate timeslots for 
>>>>>>> each of
>>>>>>>    the three methods (Stacking, Boosting and Bagging) separately. That 
>>>>>>> would
>>>>>>>    make it easy for you to work on those methods separately, as well as 
>>>>>>> to
>>>>>>>    track the progress.
>>>>>>>    - In the timeline, can you double check the "week" numbers..?
>>>>>>>    for eg; in [*Week 1­-3 (May 23 ­ June 20, 2016*], I guess it
>>>>>>>    should be "*Week 1-4*" (there are four weeks in the mentioned
>>>>>>>    duration). Similarly, check the others too.
>>>>>>>
>>>>>>> Please share us the draft proposal once you fix those.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Supun
>>>>>>>
>>>>>>> On Wed, Mar 23, 2016 at 7:17 PM, Misgana Negassi <
>>>>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote:
>>>>>>>
>>>>>>>> Hi Supun,
>>>>>>>>
>>>>>>>> I am attaching my proposal draft. I am very grateful for your
>>>>>>>> comments.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Misgana
>>>>>>>>
>>>>>>>>
>>>>>>>> On 23.03.2016 04:54, Supun Sethunga wrote:
>>>>>>>>
>>>>>>>> Hi Misgana,
>>>>>>>>
>>>>>>>> As we have mentioned in the project proposal as well, the main
>>>>>>>> objective is to integrate ensemble support for the existing flow of the
>>>>>>>> WSO2 Machine Learner. We are focusing on the three methods: Bagging,
>>>>>>>> Boosting and Stacking. (On technique per each of these methods)
>>>>>>>>
>>>>>>>> If you haven't tried out already, you can get to know the Machine
>>>>>>>> Learner product by downloading it and running it (Please use link [1] 
>>>>>>>> to
>>>>>>>> download). Official documentation [2] and blog [3] will help you on 
>>>>>>>> how to
>>>>>>>> use the product. You can also go through the source code of WSO2
>>>>>>>> ML ([4] and [5]), and get familiarized with the current 
>>>>>>>> implementations.
>>>>>>>>
>>>>>>>> Meantime, as Nirmal mentioned, can you please send us the draft of
>>>>>>>> the proposal so that we can review it and give you a feedback?
>>>>>>>>
>>>>>>>> [1]  <http://wso2.com/products/machine-learner/>
>>>>>>>> http://wso2.com/products/machine-learner/
>>>>>>>> [2]
>>>>>>>> <https://docs.wso2.com/display/ML100/Introducing+Machine+Learner>
>>>>>>>> https://docs.wso2.com/display/ML100/Introducing+Machine+Learner
>>>>>>>> [3]
>>>>>>>> <http://supunsetunga.blogspot.com/2015/09/building-your-first-predictive-model.html>
>>>>>>>> http://supunsetunga.blogspot.com/2015/09/building-your-first-predictive-model.html
>>>>>>>> [4]  <https://github.com/wso2/carbon-ml>
>>>>>>>> https://github.com/wso2/carbon-ml
>>>>>>>> [5]  <https://github.com/wso2/product-ml>
>>>>>>>> https://github.com/wso2/product-ml
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Supun
>>>>>>>>
>>>>>>>> On Wed, Mar 23, 2016 at 7:20 AM, Nirmal Fernando <
>>>>>>>> <nir...@wso2.com>nir...@wso2.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks, Misgana for your interest in a WSO2 ML GSoC project.
>>>>>>>>> Whilst I let Supun give you some more information on the project, I
>>>>>>>>> encourage you to create a draft proposal and send us for review.
>>>>>>>>>
>>>>>>>>> On Wed, Mar 23, 2016 at 2:58 AM, Misgana Negassi <
>>>>>>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote:
>>>>>>>>>
>>>>>>>>>> Hallo!
>>>>>>>>>>
>>>>>>>>>> I am Misgana, hailing from Freiburg, Germany and I am interested
>>>>>>>>>> in working with you on the Ensemble methods . I have already 
>>>>>>>>>> implemented
>>>>>>>>>> Stacking in python(code available in github/zemoel) and compared it 
>>>>>>>>>> to
>>>>>>>>>> other ensemble methods such as Ensemble Selection on AUC performance
>>>>>>>>>> measures. The comparison also included using above mentioned methods 
>>>>>>>>>> as
>>>>>>>>>> part of an automated machine learning platform(Autosklearn).
>>>>>>>>>>
>>>>>>>>>> I am currently working on my proposal and would be grateful for
>>>>>>>>>> your reply.
>>>>>>>>>>
>>>>>>>>>> Misgana
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Thanks & regards,
>>>>>>>>> Nirmal
>>>>>>>>>
>>>>>>>>> Team Lead - WSO2 Machine Learner
>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>> Mobile: +94715779733
>>>>>>>>> Blog: <http://nirmalfdo.blogspot.com/>
>>>>>>>>> http://nirmalfdo.blogspot.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Supun Sethunga*
>>>>>>>> Software Engineer
>>>>>>>> WSO2, Inc.
>>>>>>>> <http://wso2.com/>http://wso2.com/
>>>>>>>> lean | enterprise | middleware
>>>>>>>> Mobile : +94 716546324 <%2B94%20716546324>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Supun Sethunga*
>>>>>>> Software Engineer
>>>>>>> WSO2, Inc.
>>>>>>> <http://wso2.com/>http://wso2.com/
>>>>>>> lean | enterprise | middleware
>>>>>>> Mobile : +94 716546324 <%2B94%20716546324>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Supun Sethunga*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> <http://wso2.com/>http://wso2.com/
>>>>>> lean | enterprise | middleware
>>>>>> Mobile : +94 716546324 <%2B94%20716546324>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Supun Sethunga*
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> <http://wso2.com/>http://wso2.com/
>>>>> lean | enterprise | middleware
>>>>> Mobile : +94 716546324 <%2B94%20716546324>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Supun Sethunga*
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> <http://wso2.com/>http://wso2.com/
>>>> lean | enterprise | middleware
>>>> Mobile : +94 716546324 <%2B94%20716546324>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Supun Sethunga*
>>> Software Engineer
>>> WSO2, Inc.
>>> <http://wso2.com/>http://wso2.com/
>>> lean | enterprise | middleware
>>> Mobile : +94 716546324 <%2B94%20716546324>
>>>
>>
>>
>>
>> --
>> *Supun Sethunga*
>> Software Engineer
>> WSO2, Inc.
>> <http://wso2.com/>http://wso2.com/
>> lean | enterprise | middleware
>> Mobile : +94 716546324 <%2B94%20716546324>
>>
>>
>>
>
>
> --
> *Supun Sethunga*
> Software Engineer
> WSO2, Inc.
> <http://wso2.com/>http://wso2.com/
> lean | enterprise | middleware
> Mobile : +94 716546324
>
>
>
>


-- 
*Supun Sethunga*
Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to