Re: [Numpy-discussion] non-standard standard deviation

Robin Sun, 29 Nov 2009 17:15:48 -0800

On Mon, Nov 30, 2009 at 12:30 AM, Colin J. Williams <c...@ncf.ca> wrote:
> On 29-Nov-09 17:13 PM, Dr. Phillip M. Feldman wrote:
>> All of the statistical packages that I am currently using and have used in
>> the past (Matlab, Minitab, R, S-plus) calculate standard deviation using the
>> sqrt(1/(n-1)) normalization, which gives a result that is unbiased when
>> sampling from a normally-distributed population.  NumPy uses the sqrt(1/n)
>> normalization.  I'm currently using the following code to calculate standard
>> deviations, but would much prefer if this could be fixed in NumPy itself:
>>
>> def mystd(x=numpy.array([]), axis=None):
>>     """This function calculates the standard deviation of the input using the
>>     definition of standard deviation that gives an unbiased result for
>> samples
>>     from a normally-distributed population."""
>>
>>     xd= x - x.mean(axis=axis)
>>     return sqrt( (xd*xd).sum(axis=axis) / (numpy.size(x,axis=axis)-1.0) )
>>
> Anne Archibald has suggested a work-around.  Perhaps ddof could be set,
> by default to
> 1 as other values are rarely required.
>
> Where the distribution of a variate is not known a priori, then I
> believe that it can be shown
> that the n-1 divisor provides the best estimate of the variance.


There have been previous discussions on this (but I can't find them
now) and I believe the current default was chosen deliberately. I
think it is the view of the numpy developers that the n divisor has
more desireable properties in most cases than the traditional n-1 -
see this paper by Travis Oliphant for details:
http://hdl.handle.net/1877/438

Cheers

Robin
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] non-standard standard deviation

Reply via email to