[ https://issues.apache.org/jira/browse/MAHOUT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Deneche A. Hakim updated MAHOUT-122: ------------------------------------ Attachment: 3w_patch.diff *3rd Week Patch* work in progress... *Changes* * ForestBuilder becomes an object that uses a TreeBuilder object * RandomForest represents a...guess what ! it has methods to classify single instances or bunch of data. Contains also methods to compute the total and mean number of nodes and mean max depth of the trees * Added more PredictionCallback implementations! ** MeanTreeCollector computes the mean classification error among all the trees of the forest ** MultiCallback allows many callbacks to be passed to the same classification method * BreimanExample is a running example similar to the testing procedures used in Breiman's paper about Random Forests * MemoryUsage is a running app used to collect the stats about memory usage * DataSplit, a temporary app, allows to split the KDD dataset (1%, 10%, 25%, 50%) * TreeBuilder is an abstract class that builds a Decision Tree given a Data instance * DefaultTreeBuilder implementation of a TreeBuilder based on Andrew W. Moore Decision Trees tutorial *What's next* * some more memory usage tests * I think its time to start with the map-reduce implementation, the results of the memory usage tests should help us decide which implementation to pursue > Random Forests Reference Implementation > --------------------------------------- > > Key: MAHOUT-122 > URL: https://issues.apache.org/jira/browse/MAHOUT-122 > Project: Mahout > Issue Type: Task > Components: Classification > Affects Versions: 0.2 > Reporter: Deneche A. Hakim > Attachments: 2w_patch.diff, 3w_patch.diff, RF reference.patch > > Original Estimate: 25h > Remaining Estimate: 25h > > This is the first step of my GSOC project. Implement a simple, easy to > understand, reference implementation of Random Forests (Building and > Classification). The only requirement here is that "it works" -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.