[ 
https://issues.apache.org/jira/browse/MAHOUT-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated MAHOUT-140:
------------------------------------

    Status: Patch Available  (was: Open)

Work in progress...

A working implementation, I tested it on a two-ubuntu-cluster but more tests 
are needed. There is a known issue when the number of maps is high, I'll try to 
solve it in the next patch.

The main limitation of this implementation is that each mapper loads a copy of 
the data in memory, and because I don't know how to actually share the data 
between the mappers of the same slave node, this means that is you launch 
simultaneously N maps per cluster node, you'll get N copies of the data in each 
node's memory !!!

> In-memory mapreduce Random Forests
> ----------------------------------
>
>                 Key: MAHOUT-140
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-140
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>    Affects Versions: 0.2
>            Reporter: Deneche A. Hakim
>            Priority: Minor
>
> Each mapper is responsible for growing a number of trees with a whole copy of 
> the dataset loaded in memory, it uses the reference implementation's code to 
> build each tree and estimate the oob error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to