[ 
https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606994#action_12606994
 ] 

adeneche edited comment on MAHOUT-56 at 6/22/08 3:42 AM:
-----------------------------------------------------------------

*what's new*
. class discovery: based on the following paper [Discovering Comprehensible 
Classification Rules using Genetic 
Programming|http://www.cs.bham.ac.uk/~wbl/biblio/gecco1999/GP-417.pdf], a 
genetic algorithm that searches for the best binary classification rule for a 
given dataset. The population, which is a list of possible rules, is passed to 
each mapper that handles a subset of the dataset. All the new stuff is in the 
package:

org.apache.mahout.gsoc.watchmaker.classdiscovery

. I refactored some classes from the previous patch to reuse the existing code. 
The main change is the class STEvolutionEngine<T> that uses a single thread and 
the corresponding STFitnessEvaluator<T>. More details will be added to the 
comments

. I added easymock library needed to run the tests

*What's need to be done*
The following steps need to be done before considering this patch to be 
complete:
. classdiscovery.ga.CDGA (the main tool) need to become a full functional 
command-line tool
. for now CDGA uses the whole dataset for training, it should split it in a 
training set and a testing set
. because classdiscovery is not generic (at least for now), I should move it to 
the examples along with its corresponding tests
. arrange the comments
. there is no need to test the code againt TSP and Soduko, I should remove the 
Soduko test to make the tests more comprehensible
. pass the population using the DestributedCache instead of job parameter

      was (Author: adeneche):
    *what's new*
. class discovery: based on the following paper [Discovering Comprehensible 
Classification Rules using Genetic 
Programming|http://www.cs.bham.ac.uk/~wbl/biblio/gecco1999/GP-417.pdf], a 
genetic algorithm that searches for the best binary classification rule for a 
given dataset. The population, which is a list of possible rules, is passed to 
each mapper that handles a subset of the dataset. All the new stuff is in the 
package:

org.apache.mahout.gsoc.watchmaker.classdiscovery

. I refactored some classes from the previous patch to reuse the existing code. 
The main change is the class STEvolutionEngine<T> that uses a single thread and 
the corresponding STFitnessEvaluator<T>. More details will be added to the 
comments

. I added easymock library needed to run the tests

*What's need to be done*
The following steps need to be done before considering this patch to be 
complete:
. classdiscovery.ga.CDGA (the main tool) need to become a full functional 
command-line tool
. for now CDGA uses the whole dataset for training, it should split it in a 
training set and a testing set
. because classdiscovery is not generic (at least for now), I should move it to 
the examples along with its corresponding tests
. arrange the comments
. there is no need to test the code againt TSP and Soduko, I should remove the 
Soduko test to make the tests more comprehensible
  
> Watchmaker Integration
> ----------------------
>
>                 Key: MAHOUT-56
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-56
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Deneche A. Hakim
>            Assignee: Grant Ingersoll
>         Attachments: watchmaker-tsp.patch, watchmaker-tsp.patch, 
> watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in 
> Mahout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to