Hi Misgana, I committed the code for reading a csv file. My next task will be sampling > and starting to implement an ensemble method(Stacking).
I went through the code. Would like to suggest a small thing. Most of the Spark algorithms need JavaRDDs as the input for datasets. Hence reading your file as a JavaRDD<LabeledPoint> is the better approach than reading it as a list of labelled points. Please refer [1] and [2] for an example. - How to decide which models to use for an ensemble and which parameters? Type of Model/Algorithm has to be a user input. The parameters will depend on the algorithm user picks. - Should the ensemble methods be implemented as a wrapper around the > base-models? Yes. You can use the existing algorithms in WSO2 Machine Learner, as the base-models. (I have shared that in my previous mail) > - Which library to use for matrix operations? Is Apache > commons.math.Linearalgebra ok? Yes Apache commons.math.* would be fine. Infact you can use any library with open-source licence. What do you think about a hangout session to clarify stuff and get to know > each other.:) Of course! Please arrange some time slot (Hope it will be IST time zone: GMT+5.30 friendly :) ) and send me a calendar invite. [1] https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/utils/MLUtils.java#L58 [2] https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms/SupervisedSparkModelBuilder.java#L87 Regards, Supun On Sat, May 7, 2016 at 8:46 PM, Misgana Negassi <negas...@tf.uni-freiburg.de > wrote: > > Hi Supun, > I committed the code for reading a csv file. My next task will be sampling > and starting to implement an ensemble method(Stacking). > I have some questions about: > - How to decide which models to use for an ensemble and which parameters? > - Should the ensemble methods be implemented as a wrapper around the > base-models? > - Which library to use for matrix operations? Is Apache > commons.math.Linearalgebra ok? > > What do you think about a hangout session to clarify stuff and get to know > each other.:) > > Have a nice weekend! > Misgana > > > On 05.05.2016 19:59, Misgana Negassi wrote: > > Hi Supun, > Thank you for the detailed explanation. > > I switched to intelliJ IDEA as an IDE with Ubuntu 14.04. The import errors > are resolved after the project was imported as maven project. This took a > while because of persistent pom.xml errors -- I am still in the process of > reading about maven, Spark, REST and the carbon-Architecture. > I have created an independent maven-project[1] for the implementation of > the ensemble-methods. Currently I am writing code for reading a CSV file > using the Apache library and converting it into Java RDD. I will commit > once I am done with it. > > > > Regards, > Misgana > > > [1]https://github.com/zemoel/ensemble-methods > > > On 04.05.2016 06:33, Supun Sethunga wrote: > > Hi Misgana, > > Seems you have misunderstood the "carbon" architecture. Let me explain it. > *carbon-ml* repo contains the source code of the osgi bundles (i.e: jar > libraries) which contain actual implementation/logic (such as, the > implementation of importing datasets, creating projects, building models, > and etc). Hence, these are similar to any other third-party library, which > you have to invoke using their APIs. (They don't have main classes, they > have APIs). This repo also contains a REST API, which exposes the > above-mentioned APIs as a RESTful service. It also includes the source code > for the UI, which invokes those REST APIs, behalf of the user. Please refer > [1] to get the overall idea. > Whereas, *product-ml* repo contains the source code, which collects the > necessary libraries (such as the libraries of carbon-ml, REST API, UI) and > bundles it all together, to create the final binary distribution. > > *carbon-ml* already contains the implementations of number of algorithms > [2]. Your ultimate goal is to add three more such implementations to that > repo (i.e: the three ensemble methods). In doing so, you don't need to > re-implement the logics of importing datasets, creating projects and etc.. > As those have already been implemented and you can use those methods from > their APIs. (Please refer any of the current algorithms to get an idea..). > > But, since it can be difficult to implement the ensemble logic and > integrate it to carbon-ml repo, *at the same time*, We recommend you to *first > implement your logic in a separate java client.* This has to be an > independent maven project. The whole purpose of this java client is to > implement your logic independently and test its functionality and accuracy. > You can use native spark ml-lib libraries for this. In the java client, > following steps needed to be done: > > - Read the CSV file. > - Do the sampling as needed. (train set and test set) > - Train an ensemble model using the train set. > - Do prediction on the test set and evaluate the accuracy. > > (Do not worry about the project concept at this point) > > Once you are satisfied with the results, then you can integrate that logic > to the carbon-ml repo (to [2]). > > Please push whatever the code you write with respect to this java client > to GIT, and share it with us too. > > Importing classes e.g " import > org.wso2.carbon.ml.core.interfaces.MLModelBuilder" >> doesn't resolve. I tried to solve issue by clean project, or adding the >> project to the Spark build_path with no success. > > Did you get this when you try to import carbon-ml to eclipse? Did you > import it as a maven project? > > > In another topic: >> I downloaded the machine learner(binary) following the instructions. >> After logging in with admin, I didn't get the interface as it is explained >> in the >> "Building Your First Predictive Model with WSO2 Machine Learner" tutorial. > > Can you please share a screenshot of what you get once you login? Do you > see any errors in the console? if so can you please share that too? > > Hope I have answered all your questions. > > [1] <https://docs.wso2.com/display/ML110/Architecture> > https://docs.wso2.com/display/ML110/Architecture > [2] > <https://github.com/wso2/carbon-ml/tree/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms> > https://github.com/wso2/carbon-ml/tree/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms > > Regards, > Supun > > On Tue, May 3, 2016 at 9:08 PM, Misgana Negassi < > <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote: > >> Hi Supun, >> >> I did "maven clean install" and downloaded the product-ml source code as >> well. >> >> My understanding of this workflow is that I create a new project for >> reading csv file and import an ensemble model which was built with >> Spark(e.g Random Forest) and and also import the >> SupervisedSparkModelBuilder.(am I right?) >> I did this steps and currently am trying to solve issue with: >> - Importing classes e.g " import >> org.wso2.carbon.ml.core.interfaces.MLModelBuilder" doesn't resolve. I >> tried to solve issue by clean project, or adding the project to the Spark >> build_path with no success. >> - Should i create my Standalone project as maven project or normal java >> project? inside core or as an independent project? >> - I couldn't find a main class where I run and see the output of models? >> >> In another topic: >> I downloaded the machine learner(binary) following the instructions. >> After logging in with admin, I didn't get the interface as it is explained >> in the >> "Building Your First Predictive Model with WSO2 Machine Learner" tutorial. >> >> >> My apologies for asking so many questions. It has been a while since i >> worked with Eclipse and Java. >> >> Best, >> Misgana >> >> >> >> >> On 03.05.2016 06:28, Supun Sethunga wrote: >> >> Hi Misgana, >> >> Any update on the progress? >> >> Regards, >> >> On Thu, Apr 28, 2016 at 7:10 PM, Supun Sethunga < <sup...@wso2.com> >> sup...@wso2.com> wrote: >> >>> Hi Misgana, >>> >>> Please find the answers inline. >>> >>> 1. Do I need only to work with carbon-ml repo or should the whole kernel >>>> be installed? >>> >>> Don't need to build the kernal. Building carbon-ml [1] and then >>> product-ml [2] would be enough. >>> >>> The Build from source Documentation site varies from what i have >>>> setup(uses svn, downloads whole kernel). Should i follow this? >>> >>> Just download the source-code ([1] and [2]), and execute a "maven clean >>> install" from the source directory. As I mentioned earlier, no need to >>> download or build the Carbon Kernal. >>> >>> Could you suggest on how to setup my dev environment? Currently I >>>> installed Spark, converted my project to a maven project. But maven seems >>>> not to properly compile. >>> >>> No need to install spark. Spark is only used as an external library >>> (jars). If you are using Eclipse/IntelliJ IDEA, import the source-code as a >>> maven project. IDE will automatically resolve the dependencies. >>> >>> I implemented Gradientboosted to core/spark/Algorithms. >>> >>> What's the purpose of implementing Gradientboosted? >>> >>> Next step would be to modify *SupervisedSparkModelBuilder.* >>> >>> I think it would be easier for you to first (after finished with the >>> above steps) write a simple standalone java client, which reads a simple >>> dataset (a csv file) and build the ensemble model with Spark. Then you can >>> integrate that logic to the SupervisedSparkModelBuilder, and eventually >>> to the model-building workflow of WSO2 ML. >>> >>> [1] <https://github.com/wso2/carbon-ml> >>> https://github.com/wso2/carbon-ml >>> [2] <https://github.com/wso2/product-ml> >>> https://github.com/wso2/product-ml >>> [3] <https://docs.wso2.com/display/ML110/Building+from+Source> >>> https://docs.wso2.com/display/ML110/Building+from+Source >>> >>> Regards, >>> Supun >>> >>> >>> On Thu, Apr 28, 2016 at 5:22 PM, misgana < <misgananega...@gmail.com> >>> misgananega...@gmail.com> wrote: >>> >>>> Hi Supun, >>>> >>>> My current workflow looks like this: >>>> 1. Fork and clone carbon-ml repo form github -- DONE >>>> 2. Setup Dev environment -- IN PROGRESS >>>> 3. Integrate GradientBoosted tree algorithm to carbon-ml -- IN PROGRESS >>>> >>>> Issues: >>>> 1. Do I need only to work with carbon-ml repo or should the whole >>>> kernel be installed? >>>> 2. The Build from source Documentation site varies from what i have >>>> setup(uses svn, downloads whole kernel). Should i follow this? >>>> 3.Could you suggest on how to setup my dev environment? Currently I >>>> installed Spark, converted my project to a maven project. But maven seems >>>> not to properly compile. >>>> 4. I implemented Gradientboosted to core/spark/Algorithms. Next step >>>> would be to modify >>>> *SupervisedSparkModelBuilder. 5. *Here I would check how this would >>>> be integrated in the whole framework and test on Iris dataset.(On this I >>>> need to do some reading) >>>> >>>> I would very appreciate your guidance on this plan/work in progress. >>>> >>>> Best, >>>> Misgana >>>> >>>> >>>> >>>> On 26.04.2016 09:26, Misgana Negassi wrote: >>>> >>>> Hi Supun, >>>> I have forked carbon-ml to my repo[1] and currently I am familiarizing >>>> myself with the code and software architecture. I will make commits after >>>> trying out with a new algorithm. >>>> >>>> [1] <https://github.com/zemoel/carbon-ml> >>>> https://github.com/zemoel/carbon-ml >>>> >>>> On 26.04.2016 06:47, Supun Sethunga wrote: >>>> >>>> Hi Misgana, >>>> >>>> As you progress, please keep us posted too. It would be nice if you >>>> can share your code as well (Github project). You can take a fork of repo >>>> [1], and start working on your fork. >>>> >>>> [1] <https://github.com/wso2/carbon-ml> >>>> https://github.com/wso2/carbon-ml >>>> >>>> On Mon, Apr 25, 2016 at 7:57 PM, Misgana Negassi < >>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote: >>>> >>>>> Hi Supun, >>>>> >>>>> Thank you for accepting me for this project!I am excited to work on it >>>>> and start right away with the links you sent. >>>>> >>>>> Best, >>>>> Misgana >>>>> >>>>> >>>>> >>>>> On 25.04.2016 12:06, Supun Sethunga wrote: >>>>> >>>>> Hi Misgana, >>>>> >>>>> Congratulations for getting accepted for the gsoc 2016! Hope you are >>>>> ready to get started with the project. >>>>> >>>>> To get more familiarized with the code, I'm sharing the >>>>> implementations of the current algorithms [1]. For your ensemble method, >>>>> you need to add three more cases (for the three types of ensembles) for >>>>> the >>>>> method [2]. You may try out adding a new algorithm to he existing flow, >>>>> and see how it works. Please feel free to raise any questions/issues you >>>>> come across. >>>>> >>>>> [1] >>>>> https://github.com/wso2/carbon-ml/tree/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms >>>>> [2] >>>>> <https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms/SupervisedSparkModelBuilder.java#L101> >>>>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms/SupervisedSparkModelBuilder.java#L101 >>>>> >>>>> Regards, >>>>> Supun >>>>> >>>>> On Thu, Mar 24, 2016 at 9:31 PM, Misgana Negassi < >>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote: >>>>> >>>>>> Hi Supun, >>>>>> >>>>>> Thank you for your support and advice in this proposal process! >>>>>> >>>>>> In the case you are interested, I am attaching my report paper with >>>>>> contains my work with ensemble methods particularly Stacking. >>>>>> >>>>>> Best, >>>>>> Misgana >>>>>> >>>>>> >>>>>> On 24.03.2016 04:12, Supun Sethunga wrote: >>>>>> >>>>>> Looks good! Please go ahead and submit to GSoC. >>>>>> >>>>>> Thanks, >>>>>> Supun >>>>>> >>>>>> On Thu, Mar 24, 2016 at 4:02 AM, Misgana Negassi < >>>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote: >>>>>> >>>>>>> Hi Supun, >>>>>>> >>>>>>> I have added the changes you recommended. Could you kindly give me a >>>>>>> feedback? >>>>>>> >>>>>>> Best, >>>>>>> Misgana >>>>>>> >>>>>>> On 23.03.2016 15:04, Supun Sethunga wrote: >>>>>>> >>>>>>> Hi Misgana, >>>>>>> >>>>>>> I went through your proposal. Overall it looks good. Here are a few >>>>>>> comments I would like to point out: >>>>>>> >>>>>>> - Its better to have some sort of an architecture diagram, >>>>>>> explaining your solution in a higher level. >>>>>>> - In the timeline, better to break down the "Week 13 (May 23 >>>>>>> June 20, 2016)" into three sub-levels, and allocate timeslots for >>>>>>> each of >>>>>>> the three methods (Stacking, Boosting and Bagging) separately. That >>>>>>> would >>>>>>> make it easy for you to work on those methods separately, as well as >>>>>>> to >>>>>>> track the progress. >>>>>>> - In the timeline, can you double check the "week" numbers..? >>>>>>> for eg; in [*Week 1-3 (May 23 June 20, 2016*], I guess it >>>>>>> should be "*Week 1-4*" (there are four weeks in the mentioned >>>>>>> duration). Similarly, check the others too. >>>>>>> >>>>>>> Please share us the draft proposal once you fix those. >>>>>>> >>>>>>> Thanks, >>>>>>> Supun >>>>>>> >>>>>>> On Wed, Mar 23, 2016 at 7:17 PM, Misgana Negassi < >>>>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote: >>>>>>> >>>>>>>> Hi Supun, >>>>>>>> >>>>>>>> I am attaching my proposal draft. I am very grateful for your >>>>>>>> comments. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Misgana >>>>>>>> >>>>>>>> >>>>>>>> On 23.03.2016 04:54, Supun Sethunga wrote: >>>>>>>> >>>>>>>> Hi Misgana, >>>>>>>> >>>>>>>> As we have mentioned in the project proposal as well, the main >>>>>>>> objective is to integrate ensemble support for the existing flow of the >>>>>>>> WSO2 Machine Learner. We are focusing on the three methods: Bagging, >>>>>>>> Boosting and Stacking. (On technique per each of these methods) >>>>>>>> >>>>>>>> If you haven't tried out already, you can get to know the Machine >>>>>>>> Learner product by downloading it and running it (Please use link [1] >>>>>>>> to >>>>>>>> download). Official documentation [2] and blog [3] will help you on >>>>>>>> how to >>>>>>>> use the product. You can also go through the source code of WSO2 >>>>>>>> ML ([4] and [5]), and get familiarized with the current >>>>>>>> implementations. >>>>>>>> >>>>>>>> Meantime, as Nirmal mentioned, can you please send us the draft of >>>>>>>> the proposal so that we can review it and give you a feedback? >>>>>>>> >>>>>>>> [1] <http://wso2.com/products/machine-learner/> >>>>>>>> http://wso2.com/products/machine-learner/ >>>>>>>> [2] >>>>>>>> <https://docs.wso2.com/display/ML100/Introducing+Machine+Learner> >>>>>>>> https://docs.wso2.com/display/ML100/Introducing+Machine+Learner >>>>>>>> [3] >>>>>>>> <http://supunsetunga.blogspot.com/2015/09/building-your-first-predictive-model.html> >>>>>>>> http://supunsetunga.blogspot.com/2015/09/building-your-first-predictive-model.html >>>>>>>> [4] <https://github.com/wso2/carbon-ml> >>>>>>>> https://github.com/wso2/carbon-ml >>>>>>>> [5] <https://github.com/wso2/product-ml> >>>>>>>> https://github.com/wso2/product-ml >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Supun >>>>>>>> >>>>>>>> On Wed, Mar 23, 2016 at 7:20 AM, Nirmal Fernando < >>>>>>>> <nir...@wso2.com>nir...@wso2.com> wrote: >>>>>>>> >>>>>>>>> Thanks, Misgana for your interest in a WSO2 ML GSoC project. >>>>>>>>> Whilst I let Supun give you some more information on the project, I >>>>>>>>> encourage you to create a draft proposal and send us for review. >>>>>>>>> >>>>>>>>> On Wed, Mar 23, 2016 at 2:58 AM, Misgana Negassi < >>>>>>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote: >>>>>>>>> >>>>>>>>>> Hallo! >>>>>>>>>> >>>>>>>>>> I am Misgana, hailing from Freiburg, Germany and I am interested >>>>>>>>>> in working with you on the Ensemble methods . I have already >>>>>>>>>> implemented >>>>>>>>>> Stacking in python(code available in github/zemoel) and compared it >>>>>>>>>> to >>>>>>>>>> other ensemble methods such as Ensemble Selection on AUC performance >>>>>>>>>> measures. The comparison also included using above mentioned methods >>>>>>>>>> as >>>>>>>>>> part of an automated machine learning platform(Autosklearn). >>>>>>>>>> >>>>>>>>>> I am currently working on my proposal and would be grateful for >>>>>>>>>> your reply. >>>>>>>>>> >>>>>>>>>> Misgana >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Thanks & regards, >>>>>>>>> Nirmal >>>>>>>>> >>>>>>>>> Team Lead - WSO2 Machine Learner >>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>>>>> Mobile: +94715779733 >>>>>>>>> Blog: <http://nirmalfdo.blogspot.com/> >>>>>>>>> http://nirmalfdo.blogspot.com/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Supun Sethunga* >>>>>>>> Software Engineer >>>>>>>> WSO2, Inc. >>>>>>>> <http://wso2.com/>http://wso2.com/ >>>>>>>> lean | enterprise | middleware >>>>>>>> Mobile : +94 716546324 <%2B94%20716546324> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Supun Sethunga* >>>>>>> Software Engineer >>>>>>> WSO2, Inc. >>>>>>> <http://wso2.com/>http://wso2.com/ >>>>>>> lean | enterprise | middleware >>>>>>> Mobile : +94 716546324 <%2B94%20716546324> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Supun Sethunga* >>>>>> Software Engineer >>>>>> WSO2, Inc. >>>>>> <http://wso2.com/>http://wso2.com/ >>>>>> lean | enterprise | middleware >>>>>> Mobile : +94 716546324 <%2B94%20716546324> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Supun Sethunga* >>>>> Software Engineer >>>>> WSO2, Inc. >>>>> <http://wso2.com/>http://wso2.com/ >>>>> lean | enterprise | middleware >>>>> Mobile : +94 716546324 <%2B94%20716546324> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Supun Sethunga* >>>> Software Engineer >>>> WSO2, Inc. >>>> <http://wso2.com/>http://wso2.com/ >>>> lean | enterprise | middleware >>>> Mobile : +94 716546324 <%2B94%20716546324> >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> *Supun Sethunga* >>> Software Engineer >>> WSO2, Inc. >>> <http://wso2.com/>http://wso2.com/ >>> lean | enterprise | middleware >>> Mobile : +94 716546324 <%2B94%20716546324> >>> >> >> >> >> -- >> *Supun Sethunga* >> Software Engineer >> WSO2, Inc. >> <http://wso2.com/>http://wso2.com/ >> lean | enterprise | middleware >> Mobile : +94 716546324 <%2B94%20716546324> >> >> >> > > > -- > *Supun Sethunga* > Software Engineer > WSO2, Inc. > <http://wso2.com/>http://wso2.com/ > lean | enterprise | middleware > Mobile : +94 716546324 > > > > -- *Supun Sethunga* Software Engineer WSO2, Inc. http://wso2.com/ lean | enterprise | middleware Mobile : +94 716546324
_______________________________________________ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev