On Mon, Nov 30, 2009 at 12:30 AM, Colin J. Williams <c...@ncf.ca> wrote: > On 29-Nov-09 17:13 PM, Dr. Phillip M. Feldman wrote: >> All of the statistical packages that I am currently using and have used in >> the past (Matlab, Minitab, R, S-plus) calculate standard deviation using the >> sqrt(1/(n-1)) normalization, which gives a result that is unbiased when >> sampling from a normally-distributed population. NumPy uses the sqrt(1/n) >> normalization. I'm currently using the following code to calculate standard >> deviations, but would much prefer if this could be fixed in NumPy itself: >> >> def mystd(x=numpy.array([]), axis=None): >> """This function calculates the standard deviation of the input using the >> definition of standard deviation that gives an unbiased result for >> samples >> from a normally-distributed population.""" >> >> xd= x - x.mean(axis=axis) >> return sqrt( (xd*xd).sum(axis=axis) / (numpy.size(x,axis=axis)-1.0) ) >> > Anne Archibald has suggested a work-around. Perhaps ddof could be set, > by default to > 1 as other values are rarely required. > > Where the distribution of a variate is not known a priori, then I > believe that it can be shown > that the n-1 divisor provides the best estimate of the variance.
There have been previous discussions on this (but I can't find them now) and I believe the current default was chosen deliberately. I think it is the view of the numpy developers that the n divisor has more desireable properties in most cases than the traditional n-1 - see this paper by Travis Oliphant for details: http://hdl.handle.net/1877/438 Cheers Robin _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion