[
https://issues.apache.org/jira/browse/MAHOUT-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606994#action_12606994
]
adeneche edited comment on MAHOUT-56 at 6/22/08 3:42 AM:
-----------------------------------------------------------------
*what's new*
. class discovery: based on the following paper [Discovering Comprehensible
Classification Rules using Genetic
Programming|http://www.cs.bham.ac.uk/~wbl/biblio/gecco1999/GP-417.pdf], a
genetic algorithm that searches for the best binary classification rule for a
given dataset. The population, which is a list of possible rules, is passed to
each mapper that handles a subset of the dataset. All the new stuff is in the
package:
org.apache.mahout.gsoc.watchmaker.classdiscovery
. I refactored some classes from the previous patch to reuse the existing code.
The main change is the class STEvolutionEngine<T> that uses a single thread and
the corresponding STFitnessEvaluator<T>. More details will be added to the
comments
. I added easymock library needed to run the tests
*What's need to be done*
The following steps need to be done before considering this patch to be
complete:
. classdiscovery.ga.CDGA (the main tool) need to become a full functional
command-line tool
. for now CDGA uses the whole dataset for training, it should split it in a
training set and a testing set
. because classdiscovery is not generic (at least for now), I should move it to
the examples along with its corresponding tests
. arrange the comments
. there is no need to test the code againt TSP and Soduko, I should remove the
Soduko test to make the tests more comprehensible
. pass the population using the DestributedCache instead of job parameter
was (Author: adeneche):
*what's new*
. class discovery: based on the following paper [Discovering Comprehensible
Classification Rules using Genetic
Programming|http://www.cs.bham.ac.uk/~wbl/biblio/gecco1999/GP-417.pdf], a
genetic algorithm that searches for the best binary classification rule for a
given dataset. The population, which is a list of possible rules, is passed to
each mapper that handles a subset of the dataset. All the new stuff is in the
package:
org.apache.mahout.gsoc.watchmaker.classdiscovery
. I refactored some classes from the previous patch to reuse the existing code.
The main change is the class STEvolutionEngine<T> that uses a single thread and
the corresponding STFitnessEvaluator<T>. More details will be added to the
comments
. I added easymock library needed to run the tests
*What's need to be done*
The following steps need to be done before considering this patch to be
complete:
. classdiscovery.ga.CDGA (the main tool) need to become a full functional
command-line tool
. for now CDGA uses the whole dataset for training, it should split it in a
training set and a testing set
. because classdiscovery is not generic (at least for now), I should move it to
the examples along with its corresponding tests
. arrange the comments
. there is no need to test the code againt TSP and Soduko, I should remove the
Soduko test to make the tests more comprehensible
> Watchmaker Integration
> ----------------------
>
> Key: MAHOUT-56
> URL: https://issues.apache.org/jira/browse/MAHOUT-56
> Project: Mahout
> Issue Type: Task
> Reporter: Deneche A. Hakim
> Assignee: Grant Ingersoll
> Attachments: watchmaker-tsp.patch, watchmaker-tsp.patch,
> watchmaker-tsp.patch, watchmaker-tsp.patch
>
>
> The goal of this task is to allow watchmaker definded problems be solved in
> Mahout.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.