--- Phil Steitz <[EMAIL PROTECTED]> wrote: > Al Chou wrote: > >>Date: Wed, 04 Jun 2003 21:05:14 -0700 > >>From: Phil Steitz <[EMAIL PROTECTED]> > >>Subject: [math] more improvement to storage free mean, variance computation > >> > >>Check out procedure sum.2 and var.2 in > >> > >>http://www.stanford.edu/~glynn/PDF/0208.pdf > >> > >>The first looks like Brent's suggestion for a corrected mean > >>computation, with no memory required. The additional computational cost > >>that I complained about is docuemented to be 3x the flops cost of the > >>direct computation, but the computation is claimed to be more stable. So > >>the question is: do we pay the flops cost to get the numerical > >>stability? The example in the paper is compelling; but it uses small > >>words (err, numbers I mean -- sorry, slipped in to my native Fortran for > >>a moment there ;-)). So how do we go about deciding whether the > >>stability in the mean computation is worth the increased computational > >>effort? I would prefer not to answer "let the user decide". To make > >>the decision harder, we should note that it is actually worse than 3x, > >>since in the no storage version, the user may request the mean only > >>rarely (if at all) and the 3x comparison is against computiing the mean > >>for each value added. > >> > >>The variance formula looks better than what we have now, still requiring > >>no memory. Should we implement this for the no storage case? > > > > > > After implementing var.2 from the Stanford paper in UnivariateImpl and > > scratching my head for some time over why the variance calculation failed > its > > JUnit test case, I realized there's a flaw in var.2 that I can't understand > no > > one talks about. To update the variance (called S in the paper), the > formula > > calculates > > > > z = y / i > > S = S + (i-1) * y * z > > > > where i is the number of data values (including the value just being added > to > > the collection). It doesn't really matter how y is defined, because you > will > > notice that > > > > S = S + (i-1) * y * y / i > > = S + (i-1) * y**2 / i > > > > which means that S can never decrease in magnitude (for real data, which is > > what we're talking about). But for the simple case of three data values > {1, 2, > > 2} in the JUnit test case, the variance decreases between the addition of > the > > second and third data values. > > > > Can anyone point out what I'm missing here? > > > > > I think that is OK, since if you look at the definition of S earlier in > the paper, S is not the variance, it is the sum of the squared > deviations from the mean. This should be always increasing.
Where is that definition? I'm looking at equations 3 and 4, which define S_{1,q} (in LaTeX notation), and the return statement in algorithm Procedure var.2, which says S_{1,q} = S. Anyway, I think the resolution is contained in messages to follow shortly. Al ===== Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]