WHy not just use an OnlineAccumulator? Why duplicate code? On Sun, Jan 15, 2012 at 11:59 AM, Wang Yue (Commented) (JIRA) < [email protected]> wrote:
> > [ > https://issues.apache.org/jira/browse/MAHOUT-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186485#comment-13186485] > > Wang Yue commented on MAHOUT-945: > --------------------------------- > > Hi, Ikumaso Mukai, > Thanks for your improvement, I realize that you actually implement the > new online version of variance calculation according to > http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance, > however, the problem I indicate still exists, that is, the final variance > should divide by n(which is sample size.) So, I would suggest to modify the > third last line of following code, do you think so? > > + /** > + * Calculator for variance calculation > + */ > + private static class VarianceCalculator { > + > + private int n; > + private double mean; > + private double var; > + > + void add(double value) { > + n++; > + double oldMean = mean; > + mean += (value - mean) / n; > + double diff = (value - mean) * (value - oldMean); > + var += diff; > + } > + > + double getVariance() { > + return var/n; //// suggested by Wang Yue > > + } > + } > > > The variance calculation of Random forest regression tree > > --------------------------------------------------------- > > > > Key: MAHOUT-945 > > URL: https://issues.apache.org/jira/browse/MAHOUT-945 > > Project: Mahout > > Issue Type: Improvement > > Components: Classification > > Affects Versions: 0.6 > > Reporter: Wang Yue > > Labels: Regressionsplit.java > > Attachments: MAHOUT-945.patch > > > > Original Estimate: 48h > > Remaining Estimate: 48h > > > > Hi, Mukai > > Thanks for your efforts in expand the RF to regression. However, I > have a doubt about your implementation regarding to Regressionsplit.java. > The variance method > > " > > private static double variance(double[] s, double[] ss, double[] > dataSize) { > > double var = 0; > > for (int i = 0; i < s.length; i++) { > > if (dataSize[i] > 0) { > > var += ss[i] - ((s[i] * s[i]) / dataSize[i]); > > } > > } > > return var; > > } > > " > > While the variance in my mind should be something like > > var += ss[i]/dataSize[i] - ((s[i] * s[i]) / (dataSize[i]*dataSize[i])); > > Please help correct me if I am wrong. Thanks > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators: > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > For more information on JIRA, see: http://www.atlassian.com/software/jira > > >
