[ https://issues.apache.org/jira/browse/MAHOUT-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Deneche A. Hakim updated MAHOUT-140: ------------------------------------ Status: Patch Available (was: Open) Work in progress... A working implementation, I tested it on a two-ubuntu-cluster but more tests are needed. There is a known issue when the number of maps is high, I'll try to solve it in the next patch. The main limitation of this implementation is that each mapper loads a copy of the data in memory, and because I don't know how to actually share the data between the mappers of the same slave node, this means that is you launch simultaneously N maps per cluster node, you'll get N copies of the data in each node's memory !!! > In-memory mapreduce Random Forests > ---------------------------------- > > Key: MAHOUT-140 > URL: https://issues.apache.org/jira/browse/MAHOUT-140 > Project: Mahout > Issue Type: New Feature > Components: Classification > Affects Versions: 0.2 > Reporter: Deneche A. Hakim > Priority: Minor > > Each mapper is responsible for growing a number of trees with a whole copy of > the dataset loaded in memory, it uses the reference implementation's code to > build each tree and estimate the oob error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.