[ https://issues.apache.org/jira/browse/MAHOUT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717024#action_12717024 ]
Deneche A. Hakim commented on MAHOUT-122: ----------------------------------------- I've been reading Breiman's paper about Random Forests ([available here|ftp://stat-ftp.berkeley.edu/pub/users/breiman/randomforest2001.pdf]), and in page 9 he says: "Grow the tree using CART methodology to maximum size and do not prune." So apparently he uses the CART algorithm to grow the trees, and if I'm not wrong, it differs from the algorithm that I described int the wiki [http://cwiki.apache.org/MAHOUT/random-forests.html]. The most important is the way it splits CATEGORICAL attributes: * in the algorithm that I'm using a node is built for each value of the attribute * in CART a best split value is found (in a similar way to NUMERICAL attributes) and only two nodes are built given that the attribute's value is equal or not to the split value I think that the best thing to do is to create an abstract DecisionTreeBuilder class, this way we can use whatever implementation we want > Random Forests Reference Implementation > --------------------------------------- > > Key: MAHOUT-122 > URL: https://issues.apache.org/jira/browse/MAHOUT-122 > Project: Mahout > Issue Type: Task > Components: Classification > Affects Versions: 0.2 > Reporter: Deneche A. Hakim > Attachments: 2w_patch.diff, RF reference.patch > > Original Estimate: 25h > Remaining Estimate: 25h > > This is the first step of my GSOC project. Implement a simple, easy to > understand, reference implementation of Random Forests (Building and > Classification). The only requirement here is that "it works" -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.