[ 
https://issues.apache.org/jira/browse/MAHOUT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717024#action_12717024
 ] 

Deneche A. Hakim commented on MAHOUT-122:
-----------------------------------------

I've been reading Breiman's paper about Random Forests ([available 
here|ftp://stat-ftp.berkeley.edu/pub/users/breiman/randomforest2001.pdf]), and 
in page 9 he says:

"Grow the tree using CART methodology to maximum size and do not prune."

So apparently he uses the CART algorithm to grow the trees, and if I'm not 
wrong, it differs from the algorithm that I described int the wiki 
[http://cwiki.apache.org/MAHOUT/random-forests.html]. The most important is the 
way it splits CATEGORICAL attributes:
* in the algorithm that I'm using a node is built for each value of the 
attribute
* in CART a best split value is found (in a similar way to NUMERICAL 
attributes) and only two nodes are built given that the attribute's value is 
equal or not to the split value

I think that the best thing to do is to create an abstract DecisionTreeBuilder 
class, this way we can use whatever implementation we want

> Random Forests Reference Implementation
> ---------------------------------------
>
>                 Key: MAHOUT-122
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-122
>             Project: Mahout
>          Issue Type: Task
>          Components: Classification
>    Affects Versions: 0.2
>            Reporter: Deneche A. Hakim
>         Attachments: 2w_patch.diff, RF reference.patch
>
>   Original Estimate: 25h
>  Remaining Estimate: 25h
>
> This is the first step of my GSOC project. Implement a simple, easy to 
> understand, reference implementation of Random Forests (Building and 
> Classification). The only requirement here is that "it works"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to