On 12/04/2009 06:18 AM, yogesh karpate wrote:
@ Pauli and @ Colin:
Sorry for the late reply. I was busy in some other assignments. # As far as normalization by(n) is concerned then its common assumption that the population is normally distributed and population size is fairly large enough to fit the normal distribution. But this standard deviation, when applied to a small population, tends to be too low therefore it is called as biased. # The correction known as bessel correction is there for small sample size std. deviation. i.e. normalization by (n-1). # In "electrical-and-electronic-measurements-and-instrumentation" by A.K. Sawhney . In 1st chapter of the book "Fundamentals of Meausrements " . Its shown that for N=16 the std. deviation normalization was (n-1)=15 # While I was learning statistics in my course Instructor would advise to take n=20 for normalization by (n-1)
# Probability and statistics by Schuam Series  is good reading.
Regards
~ymk


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
Basically, all that I see with these arbitrary values is that you are relying on the 'central limit theorem' (http://en.wikipedia.org/wiki/Central_limit_theorem). Really the issue in using these values is how much statistical bias will you tolerate especially in the impact on usage of that estimate because the usage of variance (such as in statistical tests) tend to be more influenced by bias than the estimate of variance. (Of course, many features rely on asymptotic properties so bias concerns are less apparent in large sample sizes.)

Obviously the default relies on the developers background and requirements. There are multiple valid variance estimators in statistics with different denominators like N (maximum likelihood estimator), N-1 (restricted maximum likelihood estimator and certain Bayesian estimators) and Stein's (http://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator). So thecurrent default behavior is a valid and documented. Consequently you can not just have one option or different functions (like certain programs) and Numpy's implementation actually allows you do all these in a single function. So I also see no reason change even if I have to add the ddof=1 argument, after all 'Explicit is better than implicit' :-).

Bruce





_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to