[issue21184] statistics.pvariance with known mean does not work as expected

Steven D'Aprano Tue, 08 Apr 2014 20:15:06 -0700

New submission from Steven D'Aprano:

If you know the population mean mu, you should calculate the sample variance by 
passing mu as an explicit argument to statistics.pvariance. Unfortunately, it 
doesn't work as designed:


py> data = [1, 2, 2, 2, 3, 4]  # sample from a population with mu=2.5
py> statistics.pvariance(data)  # uses the sample mean 2.3333...
0.8888888888888888
py> statistics.pvariance(data, 2.5)  # using known population mean
0.8888888888888888

The second calculation ought to be 0.91666... not 0.88888...

The problem lies with the _ss private function which calculates the sum of 
square deviations. Unfortunately it is too clever: it includes an error 
adjustment term

ss -= _sum((x-c) for x in data)**2/len(data)

which mathematically is expected to be zero when c is calculated as the mean of 
data, but due to rounding may not be quite zero. But when c is given 
explicitly, as happens if the caller provides an explicit mu argument to 
pvariance, then the error adjustment has the effect of neutralizing the 
explicit mu.

The obvious fix is to just skip the adjustment in _ss when c is explicitly 
given, but I'm not sure if that's the best approach.

----------
assignee: stevenjd
components: Library (Lib)
messages: 215802
nosy: stevenjd
priority: normal
severity: normal
stage: needs patch
status: open
title: statistics.pvariance with known mean does not work as expected
type: behavior
versions: Python 3.4, Python 3.5

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21184>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21184] statistics.pvariance with known mean does not work as expected

Reply via email to