On 12/04/2009 06:18 AM, yogesh karpate wrote:
@ Pauli and @ Colin:
Sorry for the late reply. I was busy
in some other assignments.
# As far as normalization by(n) is concerned then its common
assumption that the population is normally distributed and population
size is fairly large enough to fit the normal distribution. But this
standard deviation, when applied to a small population, tends to be
too low therefore it is called as biased.
# The correction known as bessel correction is there for small sample
size std. deviation. i.e. normalization by (n-1).
# In "electrical-and-electronic-measurements-and-instrumentation" by
A.K. Sawhney . In 1st chapter of the book "Fundamentals of
Meausrements " . Its shown that for N=16 the std. deviation
normalization was (n-1)=15
# While I was learning statistics in my course Instructor would advise
to take n=20 for normalization by (n-1)
# Probability and statistics by Schuam Series is good reading.
Regards
~ymk
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
Basically, all that I see with these arbitrary values is that you are
relying on the 'central limit theorem'
(http://en.wikipedia.org/wiki/Central_limit_theorem). Really the issue
in using these values is how much statistical bias will you tolerate
especially in the impact on usage of that estimate because the usage of
variance (such as in statistical tests) tend to be more influenced by
bias than the estimate of variance. (Of course, many features rely on
asymptotic properties so bias concerns are less apparent in large sample
sizes.)
Obviously the default relies on the developers background and
requirements. There are multiple valid variance estimators in statistics
with different denominators like N (maximum likelihood estimator), N-1
(restricted maximum likelihood estimator and certain Bayesian
estimators) and Stein's
(http://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator). So
thecurrent default behavior is a valid and documented. Consequently you
can not just have one option or different functions (like certain
programs) and Numpy's implementation actually allows you do all these in
a single function. So I also see no reason change even if I have to add
the ddof=1 argument, after all 'Explicit is better than implicit' :-).
Bruce
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion