Re: [math] more improvement to storage free mean, variance computation

Al Chou Sat, 14 Jun 2003 12:15:35 -0700

--- "Mark R. Diggory" <[EMAIL PROTECTED]> wrote:
> Al Chou wrote:
> > After implementing var.2 from the Stanford paper in UnivariateImpl and
> > scratching my head for some time over why the variance calculation failed
> its
> > JUnit test case, I realized there's a flaw in var.2 that I can't understand
> no
> > one talks about.  To update the variance (called S in the paper), the
> formula
> > calculates
> > 
> > z = y / i
> > S = S + (i-1) * y * z
> > 
> > where i is the number of data values (including the value just being added
> to
> > the collection).  It doesn't really matter how y is defined, because you
> will
> > notice that
> > 
> > S = S + (i-1) * y * y / i
> >   = S + (i-1) * y**2 / i
> > 
> > which means that S can never decrease in magnitude (for real data, which is
> > what we're talking about).  But for the simple case of three data values
> {1, 2,
> > 2} in the JUnit test case, the variance decreases between the addition of
> the
> > second and third data values.
> > 
> > Can anyone point out what I'm missing here?
> 
> Al, I see what your saying, I wrote a little example case to implement 
> the pseudo code they have in the paper:
> 
> public class SmallTest {
> 
>      public static void main(String[] args) {
>          double[] vals = new double[] { 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0 };
> 
>          double m = vals[0];
>          double s = 0.0;
> 
>          System.out.println("m=" + m);
>          System.out.println("s=" + s);
>          System.out.println("");
> 
>          for (int i = 2; i <= vals.length; i++) {
> 
>              double y = vals[i-1] - m;
>              double z = y / i;
>              m += z;
>              s += (i - 1) * y * z;
> 
>              System.out.println("y=" + y);
>              System.out.println("z=" + z);
>              System.out.println("m=" + m);
>              System.out.println("s=" + s);
>              System.out.println("");
>          }
>      }
> }
> 
> s does seem to increase even thought the variance of the calculation 
> should be going down.
> 
> I want us to review this paper further and go back to the research of
> 
> Hanson, R. J. 1975. Stably updating mean and standard
> deviation of data. Communications of the
> ACM 18:57-58.
> Stanford, where he currently holds the Thomas Ford
> Chair in the Department of Engineering-Economic
> 
> Lets verify if theres a typo in the equation or something. Maybe these 
> guys even misenterpreted his work.


Thanks for trying it out, Mark.  Your code reads substantially the same as
mine, except that I was working inside of UnivariateImpl.

Google can't find the original paper online, but it does find Richard J.
Hanson's personal Web site, containing a bibliography of his publications and
two email addresses for him.  Anyone have the courage to email him without
having first read the original paper?  I wish I could derive the (or at least
an) updating variance formula myself; maybe I should try again.


Al

=====
Albert Davidson Chou

    Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [math] more improvement to storage free mean, variance computation

Reply via email to