[ 
https://issues.apache.org/jira/browse/MAHOUT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720847#action_12720847
 ] 

Ted Dunning commented on MAHOUT-122:
------------------------------------


Just for grins, I plotted the max memory usage versus number of instances.  The 
relationship (so far) is very much a straight line.  The fit that R gives me is 
450 (se=25) bytes per data instance with a fixed overhead of 23MB (se=14MB).  
The fixed overhead can't necessarily be distinguished from zero, but it looks 
right.  

The 450 byte overhead per training instance seems a little bit high, but I 
don't know the data well so it might be pretty reasonable.  The original data 
size was about 100 bytes.


> Random Forests Reference Implementation
> ---------------------------------------
>
>                 Key: MAHOUT-122
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-122
>             Project: Mahout
>          Issue Type: Task
>          Components: Classification
>    Affects Versions: 0.2
>            Reporter: Deneche A. Hakim
>         Attachments: 2w_patch.diff, 3w_patch.diff, RF reference.patch
>
>   Original Estimate: 25h
>  Remaining Estimate: 25h
>
> This is the first step of my GSOC project. Implement a simple, easy to 
> understand, reference implementation of Random Forests (Building and 
> Classification). The only requirement here is that "it works"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to