Hello all,

A little challenge that's been bothering me today.

The docs for std.algorithm give an illustration of its use to calculate mean and standard deviation in a single pass:

      // Compute sum and sum of squares in one pass
      r = reduce!("a + b", "a + b * b")(tuple(0.0, 0.0), a);
      assert(r[0] == 35); // sum
      assert(r[1] == 233); // sum of squares
      // Compute average and standard deviation from the above
      auto avg = r[0] / a.length;
      auto stdev = sqrt(r[1] / a.length - avg * avg);

However, this formula for standard deviation is one that is well known for being subject to potentially fatal rounding error. Consider the following:

      float[] a = [10_000.0f, 10_001.0f, 10_002.0f];

      auto r = reduce!("a + b", "a + b * b")(tuple(0.0f, 0.0f), a);

      auto avg = r[0] / a.length;
      auto sd = sqrt(r[1] / a.length - avg * avg);
      writeln(avg, "\t", sd);

... which gives a value of 0 for standard deviation when compiled with LDC or GDC, and -nan with DMD, when the correct answer is 0.816497.

An improved online formula calculates together two quantities:

             {  x[1]    if k == 1
    M[k]  =  {
             {  M[k-1] + (x[k] - M[k-1]) / k    , if k > 1

             {  0      if k == 1
    Q[k]  =  {
             {  Q[k-1] + (k - 1) * ((x[k] - M[k-1]) ^^ 2) / k   ,  if k > 1

... which one can readily prove give you respectively the mean of x[1], ..., x[k] and the sum of squares of deviations from the mean; so that you can calculate sample variance as Q[n] / (n - 1), or standard variance as Q[n] / n with standard deviation being given by the square root of this.

In a reduce sense, you'd want to calculate the mean according to,

    reduce!"a + (b - a)/k"(0.0, x);

... where k represents the index count 1, 2, 3, ... However, it's not evident to me how you could get reduce() to know this counting value.

Where calculating Q[k] is concerned, you need to know both the index value _and_ the value of the corresponding M[k]. Again, it's not evident to me how you'd indicate to reduce() what is needed.

Can anyone offer advice on how to achieve these calculations using reduce?

Thanks & best wishes,

    -- Joe

Reply via email to