[ 
https://issues.apache.org/jira/browse/MAHOUT-945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ikumasa Mukai updated MAHOUT-945:
---------------------------------

    Attachment: MAHOUT-945.patch

Hi
I made a new patch which has the Wang-san's point. Thank you Wang-san.

On this, I adopt using FullRunningAverageAndStdDev instead of the own code for 
calculating the variances.

And for the performance, this patch has the modification on 
FullRunningAverageAndStdDev.

It is nice if you would check whether the modification is acceptable.

Regards,
                
> The variance calculation of Random forest regression tree
> ---------------------------------------------------------
>
>                 Key: MAHOUT-945
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-945
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.6
>            Reporter: Wang Yue
>              Labels: Regressionsplit.java
>         Attachments: MAHOUT-945.patch, MAHOUT-945.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi, Mukai
>   Thanks for your efforts in expand the RF to regression. However, I have a 
> doubt about your implementation regarding to Regressionsplit.java. The 
> variance method 
> "
>  private static double variance(double[] s, double[] ss, double[] dataSize) {
>     double var = 0;
>     for (int i = 0; i < s.length; i++) {
>       if (dataSize[i] > 0) {
>         var += ss[i] - ((s[i] * s[i]) / dataSize[i]);
>       }
>     }
>     return var;
>   }
> "
> While the variance in my mind should be something like 
> var += ss[i]/dataSize[i] - ((s[i] * s[i]) / (dataSize[i]*dataSize[i]));
> Please help correct me if I am wrong. Thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to