[
https://issues.apache.org/jira/browse/MAHOUT-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186881#comment-13186881
]
Ikumasa Mukai commented on MAHOUT-945:
--------------------------------------
Hi wang-san.
Thank you for your comment and sorry for not replying your point.
But, for just building trees, I think it is better not to divide by "n".
Because it (dividing by "n") will produce rounding errors when calculating the
gains
and this does not change the logic.
It is grest if you check this.
Regards,
> The variance calculation of Random forest regression tree
> ---------------------------------------------------------
>
> Key: MAHOUT-945
> URL: https://issues.apache.org/jira/browse/MAHOUT-945
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Affects Versions: 0.6
> Reporter: Wang Yue
> Labels: Regressionsplit.java
> Attachments: MAHOUT-945.patch
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi, Mukai
> Thanks for your efforts in expand the RF to regression. However, I have a
> doubt about your implementation regarding to Regressionsplit.java. The
> variance method
> "
> private static double variance(double[] s, double[] ss, double[] dataSize) {
> double var = 0;
> for (int i = 0; i < s.length; i++) {
> if (dataSize[i] > 0) {
> var += ss[i] - ((s[i] * s[i]) / dataSize[i]);
> }
> }
> return var;
> }
> "
> While the variance in my mind should be something like
> var += ss[i]/dataSize[i] - ((s[i] * s[i]) / (dataSize[i]*dataSize[i]));
> Please help correct me if I am wrong. Thanks
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira