[ 
https://issues.apache.org/jira/browse/MAHOUT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated MAHOUT-122:
------------------------------------

    Attachment: 3w_patch.diff

*3rd Week Patch*
work in progress...

*Changes*

* ForestBuilder becomes an object that uses a TreeBuilder object

* RandomForest represents a...guess what ! it has methods to classify single 
instances or bunch of data. Contains also methods to compute the total and mean
 number of nodes and mean max depth of the trees

* Added more PredictionCallback implementations!

** MeanTreeCollector computes the mean classification error among all the trees 
of the forest

** MultiCallback allows many callbacks to be passed to the same classification 
method

* BreimanExample is a running example similar to the testing procedures used in 
Breiman's paper about Random Forests

* MemoryUsage is a running app used to collect the stats about memory usage

* DataSplit, a temporary app, allows to split the KDD dataset (1%, 10%, 25%, 
50%)

* TreeBuilder is an abstract class that builds a Decision Tree given a Data 
instance

* DefaultTreeBuilder implementation of a TreeBuilder based on Andrew W. Moore 
Decision Trees tutorial

*What's next*
* some more memory usage tests
* I think its time to start with the map-reduce implementation, the results of 
the memory usage tests should help us decide which implementation to pursue

> Random Forests Reference Implementation
> ---------------------------------------
>
>                 Key: MAHOUT-122
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-122
>             Project: Mahout
>          Issue Type: Task
>          Components: Classification
>    Affects Versions: 0.2
>            Reporter: Deneche A. Hakim
>         Attachments: 2w_patch.diff, 3w_patch.diff, RF reference.patch
>
>   Original Estimate: 25h
>  Remaining Estimate: 25h
>
> This is the first step of my GSOC project. Implement a simple, easy to 
> understand, reference implementation of Random Forests (Building and 
> Classification). The only requirement here is that "it works"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to