[ https://issues.apache.org/jira/browse/MAHOUT-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Deneche A. Hakim updated MAHOUT-140: ------------------------------------ Attachment: mapred_jul12.diff *Changes* * The oob error estimation has been rewritten to become much more faster * BuildForest has an optional argument '-o' to use the optimized IG calculations I tested the implementation on Amazon EC2: * on a 1 small instance cluster (1 master + 1 slave), building 50 trees with KDD10% takes 44m 45s * on a 10 small instances cluster (1 master + 10 slaves), building 50 trees with KDD10% takes 7m 50s *what's next* * Although many improvements are possible, the actual InMem implementation does a good job. I shall start coding the other mapreduce variant where each mapper uses only the subset of data available to grow the trees > In-memory mapreduce Random Forests > ---------------------------------- > > Key: MAHOUT-140 > URL: https://issues.apache.org/jira/browse/MAHOUT-140 > Project: Mahout > Issue Type: New Feature > Components: Classification > Affects Versions: 0.2 > Reporter: Deneche A. Hakim > Priority: Minor > Attachments: mapred_jul12.diff, mapred_patch.diff > > > Each mapper is responsible for growing a number of trees with a whole copy of > the dataset loaded in memory, it uses the reference implementation's code to > build each tree and estimate the oob error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.