On 04-Dec-09 10:54 AM, Bruce Southey wrote: > On 12/04/2009 06:18 AM, yogesh karpate wrote: >> @ Pauli and @ Colin: >> Sorry for the late reply. I was >> busy in some other assignments. >> # As far as normalization by(n) is concerned then its common >> assumption that the population is normally distributed and population >> size is fairly large enough to fit the normal distribution. But this >> standard deviation, when applied to a small population, tends to be >> too low therefore it is called as biased. >> # The correction known as bessel correction is there for small sample >> size std. deviation. i.e. normalization by (n-1). >> # In "electrical-and-electronic-measurements-and-instrumentation" by >> A.K. Sawhney . In 1st chapter of the book "Fundamentals of >> Meausrements " . Its shown that for N=16 the std. deviation >> normalization was (n-1)=15 >> # While I was learning statistics in my course Instructor would >> advise to take n=20 for normalization by (n-1) >> # Probability and statistics by Schuam Series is good reading. >> Regards >> ~ymk >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > Hi, > Basically, all that I see with these arbitrary values is that you are > relying on the 'central limit theorem' > (http://en.wikipedia.org/wiki/Central_limit_theorem). Really the > issue in using these values is how much statistical bias will you > tolerate especially in the impact on usage of that estimate because > the usage of variance (such as in statistical tests) tend to be more > influenced by bias than the estimate of variance. (Of course, many > features rely on asymptotic properties so bias concerns are less > apparent in large sample sizes.) > > Obviously the default relies on the developers background and > requirements. There are multiple valid variance estimators in > statistics with different denominators like N (maximum likelihood > estimator), N-1 (restricted maximum likelihood estimator and certain > Bayesian estimators) and Stein's > (http://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator). So > thecurrent default behavior is a valid and documented. Consequently > you can not just have one option or different functions (like certain > programs) and Numpy's implementation actually allows you do all these > in a single function. So I also see no reason change even if I have to > add the ddof=1 argument, after all 'Explicit is better than implicit' :-). > > Bruce Bruce,
I suggest that the Central Limit Theorem is tied in with the Law of Large Numbers. When one has a smallish sample size, what give the best estimate of the variance? The Bessel Correction provides a rationale, based on expectations: (http://en.wikipedia.org/wiki/Bessel%27s_correction). It is difficult to understand the proof of Stein: http://en.wikipedia.org/wiki/Proof_of_Stein%27s_example The symbols used are not clearly stated. He seems interested in a decision rule for the calculation of the mean of a sample and claims that his approach is better than the traditional Least Squares approach. In most cases, the interest is likely to be in the variance, with a view to establishing a confidence interval. In the widely used Analysis of Variance (ANOVA), the degrees of freedom are reduced for each mean estimated, see: http://www.mnstate.edu/wasson/ed602lesson13.htm for the example below: *Analysis of Variance Table* ** Source of Variation Sum of Squares Degrees of Freedom Mean Square F Ratio p Between Groups 25.20 2 12.60 5.178 <.05 Within Groups 29.20 12 2.43 Total 54.40 14 There is a sample of 15 observations, which is divided into three groups, depending on the number of hours of therapy. Thus, the Total degrees of freedom are 15-1 = 14, the Between Groups 3-1 = 2 and the Residual is 14 - 2 = 12. Colin W. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion