Re: [Numpy-discussion] non-standard standard deviation

2009-12-11 Thread Dr. Phillip M. Feldman



Anne Archibald wrote:
 
 2009/11/29 Dr. Phillip M. Feldman pfeld...@verizon.net:
 
 All of the statistical packages that I am currently using and have used
 in
 the past (Matlab, Minitab, R, S-plus) calculate standard deviation using
 the
 sqrt(1/(n-1)) normalization, which gives a result that is unbiased when
 sampling from a normally-distributed population.  NumPy uses the
 sqrt(1/n)
 normalization.  I'm currently using the following code to calculate
 standard
 deviations, but would much prefer if this could be fixed in NumPy itself:
 
 This issue was the subject of lengthy discussions on the mailing list,
 the upshot of which is that in current versions of scipy, std and var
 take an optional argument ddof, into which you can supply 1 to get
 the normalization you want.
 
 Anne
 

You are right that I can get the result that I want by setting ddof. 
Thanks!

I still feel that the default value for ddof should be 1 rather than 0; new
users are unlikely to read the documentation for a command like std, because
it is reasonable to expect standard behavior across all statistical
packages.

Phillip
-- 
View this message in context: 
http://old.nabble.com/non-standard-standard-deviation-tp26566808p26753999.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-06 Thread Colin J. Williams


On 04-Dec-09 10:54 AM, Bruce Southey wrote:
 On 12/04/2009 06:18 AM, yogesh karpate wrote:
 @ Pauli and @ Colin:
   Sorry for the late reply. I was 
 busy in some other assignments.
 # As far as  normalization by(n) is concerned then its common 
 assumption that the population is normally distributed and population 
 size is fairly large enough to fit the normal distribution. But this 
 standard deviation, when applied to a small population, tends to be 
 too low therefore it is called  as biased.
 # The correction known as bessel correction is there for small sample 
 size std. deviation. i.e. normalization by (n-1).
 # In electrical-and-electronic-measurements-and-instrumentation by 
 A.K. Sawhney . In 1st chapter of the book Fundamentals of 
 Meausrements  . Its shown that for N=16 the std. deviation 
 normalization was (n-1)=15
 # While I was learning statistics in my course Instructor would 
 advise to take n=20 for normalization by (n-1)
 # Probability and statistics by Schuam Series  is good reading.
 Regards
 ~ymk


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 Hi,
 Basically, all that I see with these arbitrary values is that you are 
 relying on the 'central limit theorem' 
 (http://en.wikipedia.org/wiki/Central_limit_theorem).  Really the 
 issue in using these values is how much statistical bias will you 
 tolerate especially in the impact on usage of that estimate because 
 the usage of variance (such as in statistical tests) tend to be more 
 influenced by bias than the estimate of variance. (Of course, many 
 features rely on asymptotic properties so bias concerns are less 
 apparent in large sample sizes.)

 Obviously the default relies on the developers background and 
 requirements. There are multiple valid variance estimators in 
 statistics with different denominators like N (maximum likelihood 
 estimator), N-1 (restricted maximum likelihood estimator and certain 
 Bayesian estimators) and Stein's 
 (http://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator). So 
 thecurrent default behavior is a valid and documented. Consequently 
 you can not just have one option or different functions (like certain 
 programs) and Numpy's implementation actually allows you do all these 
 in a single function. So I also see no reason change even if I have to 
 add the ddof=1 argument, after all 'Explicit is better than implicit' :-).

 Bruce
Bruce,

I suggest that the Central Limit Theorem is tied in with the Law of 
Large Numbers.

When one has a smallish sample size, what give the best estimate of the 
variance?  The Bessel Correction provides a rationale, based on 
expectations: (http://en.wikipedia.org/wiki/Bessel%27s_correction).

It is difficult to understand the proof of Stein: 
http://en.wikipedia.org/wiki/Proof_of_Stein%27s_example

The symbols used are not clearly stated.  He seems interested in a 
decision rule for the calculation of the mean of a sample and claims 
that his approach is better than the traditional Least Squares approach.

In most cases, the interest is likely to be in the variance, with a view 
to establishing a confidence interval.

In the widely used Analysis of Variance (ANOVA), the degrees of freedom 
are reduced for each mean estimated, see:
http://www.mnstate.edu/wasson/ed602lesson13.htm for the example below:

*Analysis of Variance Table* ** Source of
Variation   Sum of
Squares Degrees of
Freedom Mean
Square  F Ratio p
Between Groups  25.20   2   12.60   5.178   .05
Within Groups   29.20   12  2.43

Total   54.40   14  




There is a sample of 15 observations, which is divided into three 
groups, depending on the number of hours of therapy.
Thus, the Total degrees of freedom are 15-1 = 14,  the Between Groups 
3-1 = 2 and the Residual is 14 - 2 = 12.

Colin W.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-06 Thread josef . pktd
On Sun, Dec 6, 2009 at 11:01 AM, Colin J. Williams c...@ncf.ca wrote:


 On 04-Dec-09 10:54 AM, Bruce Southey wrote:
 On 12/04/2009 06:18 AM, yogesh karpate wrote:
 @ Pauli and @ Colin:
                                   Sorry for the late reply. I was
 busy in some other assignments.
 # As far as  normalization by(n) is concerned then its common
 assumption that the population is normally distributed and population
 size is fairly large enough to fit the normal distribution. But this
 standard deviation, when applied to a small population, tends to be
 too low therefore it is called  as biased.
 # The correction known as bessel correction is there for small sample
 size std. deviation. i.e. normalization by (n-1).
 # In electrical-and-electronic-measurements-and-instrumentation by
 A.K. Sawhney . In 1st chapter of the book Fundamentals of
 Meausrements  . Its shown that for N=16 the std. deviation
 normalization was (n-1)=15
 # While I was learning statistics in my course Instructor would
 advise to take n=20 for normalization by (n-1)
 # Probability and statistics by Schuam Series  is good reading.
 Regards
 ~ymk


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 Hi,
 Basically, all that I see with these arbitrary values is that you are
 relying on the 'central limit theorem'
 (http://en.wikipedia.org/wiki/Central_limit_theorem).  Really the
 issue in using these values is how much statistical bias will you
 tolerate especially in the impact on usage of that estimate because
 the usage of variance (such as in statistical tests) tend to be more
 influenced by bias than the estimate of variance. (Of course, many
 features rely on asymptotic properties so bias concerns are less
 apparent in large sample sizes.)

 Obviously the default relies on the developers background and
 requirements. There are multiple valid variance estimators in
 statistics with different denominators like N (maximum likelihood
 estimator), N-1 (restricted maximum likelihood estimator and certain
 Bayesian estimators) and Stein's
 (http://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator). So
 thecurrent default behavior is a valid and documented. Consequently
 you can not just have one option or different functions (like certain
 programs) and Numpy's implementation actually allows you do all these
 in a single function. So I also see no reason change even if I have to
 add the ddof=1 argument, after all 'Explicit is better than implicit' :-).

 Bruce
 Bruce,

 I suggest that the Central Limit Theorem is tied in with the Law of
 Large Numbers.

 When one has a smallish sample size, what give the best estimate of the
 variance?  The Bessel Correction provides a rationale, based on
 expectations: (http://en.wikipedia.org/wiki/Bessel%27s_correction).

 It is difficult to understand the proof of Stein:
 http://en.wikipedia.org/wiki/Proof_of_Stein%27s_example

 The symbols used are not clearly stated.  He seems interested in a
 decision rule for the calculation of the mean of a sample and claims
 that his approach is better than the traditional Least Squares approach.

 In most cases, the interest is likely to be in the variance, with a view
 to establishing a confidence interval.

What's the best estimate? That's the main question

Estimators differ in their (sample or posterior) distribution,
especially bias and variance.
Stein estimator dominates OLS in the mean squared error, so although
it is biased, the variance of the estimator is smaller than OLS so that
MSE (bias plus variance) is also smaller for Stein estimator than for OLS.
Depending on the application there could be many possible loss functions,
including asymmetric, eg. if its more costly to over than to under estimate.

The following was a good book for this, that I read a long time ago:
Statistical decision theory and Bayesian analysis By James O. Berger

http://books.google.ca/books?id=oY_x7dE15_ACpg=PP1lpg=PP1dq=berger+decisionsource=blots=wzL3ocu5_9sig=lGm5VevPtnFW570mgeqJklASalUhl=enei=P9cbS5CSCIqllAf-0f3xCQsa=Xoi=book_resultct=resultresnum=4ved=0CBcQ6AEwAw#v=onepageq=f=false



 In the widely used Analysis of Variance (ANOVA), the degrees of freedom
 are reduced for each mean estimated, see:
 http://www.mnstate.edu/wasson/ed602lesson13.htm for the example below:

 *Analysis of Variance Table* ** Source of
 Variation       Sum of
 Squares         Degrees of
 Freedom         Mean
 Square  F Ratio         p
 Between Groups  25.20   2       12.60   5.178   .05
 Within Groups   29.20   12      2.43

 Total   54.40   14


 There is a sample of 15 observations, which is divided into three
 groups, depending on the number of hours of therapy.
 Thus, the Total degrees of freedom are 15-1 = 14,  the Between Groups
 3-1 = 2 and the Residual is 14 - 2 = 12.

Statistical tests are the only area where I really pay attention to the
degrees of freedom, since the 

Re: [Numpy-discussion] non-standard standard deviation

2009-12-06 Thread Charles R Harris
On Sun, Dec 6, 2009 at 9:21 AM, josef.p...@gmail.com wrote:

 On Sun, Dec 6, 2009 at 11:01 AM, Colin J. Williams c...@ncf.ca wrote:
 


snip


 What's the best estimate? That's the main question

 Estimators differ in their (sample or posterior) distribution,
 especially bias and variance.
 Stein estimator dominates OLS in the mean squared error, so although
 it is biased, the variance of the estimator is smaller than OLS so that
 MSE (bias plus variance) is also smaller for Stein estimator than for OLS.
 Depending on the application there could be many possible loss functions,
 including asymmetric, eg. if its more costly to over than to under
 estimate.

 The following was a good book for this, that I read a long time ago:
 Statistical decision theory and Bayesian analysis By James O. Berger


 http://books.google.ca/books?id=oY_x7dE15_ACpg=PP1lpg=PP1dq=berger+decisionsource=blots=wzL3ocu5_9sig=lGm5VevPtnFW570mgeqJklASalUhl=enei=P9cbS5CSCIqllAf-0f3xCQsa=Xoi=book_resultct=resultresnum=4ved=0CBcQ6AEwAw#v=onepageq=f=false


At last, an explanation I can understand. Thanks Josef.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-06 Thread Sturla Molden
Colin J. Williams skrev:
 When one has a smallish sample size, what give the best estimate of the 
 variance? 
What do you mean by best estimate?

Unbiased? Smallest standard error?


 In the widely used Analysis of Variance (ANOVA), the degrees of freedom 
 are reduced for each mean estimated, 
That is for statistical tests, not to compute estimators.





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-06 Thread Bruce Southey
On Sun, Dec 6, 2009 at 11:36 AM, Sturla Molden stu...@molden.no wrote:
 Colin J. Williams skrev:
 When one has a smallish sample size, what give the best estimate of the
 variance?
 What do you mean by best estimate?

 Unbiased? Smallest standard error?


 In the widely used Analysis of Variance (ANOVA), the degrees of freedom
 are reduced for each mean estimated,
 That is for statistical tests, not to compute estimators.





 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


Ignoring the estimation method, there is no correct answer unless you
impose various conditions like minimum-variance unbiased estimator
(http://en.wikipedia.org/wiki/Minimum_variance_unbiased) where usually
N-1 wins.

Anyhow, this is way off topic since it is totally in the realm of math stats.

Law of large numbers
(http://en.wikipedia.org/wiki/Law_of_large_numbers) just address that
the average not the variance so it is not directly applicable.

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-05 Thread Colin J. Williams


On 04-Dec-09 05:21 AM, Pauli Virtanen wrote:
 pe, 2009-12-04 kello 11:19 +0100, Chris Colbert kirjoitti:

 Why cant the divisor constant just be made an optional kwarg that
 defaults to zero?
  
 It already is an optional kwarg that defaults to zero.

 Cheers,

I suggested that 1 (one) would be a better default but Robert Kern told 
us that it won't happen.

Colin W.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-05 Thread Colin J. Williams


On 04-Dec-09 07:18 AM, yogesh karpate wrote:
 @ Pauli and @ Colin:
   Sorry for the late reply. I was busy 
 in some other assignments.
 # As far as  normalization by(n) is concerned then its common 
 assumption that the population is normally distributed and population 
 size is fairly large enough to fit the normal distribution. But this 
 standard deviation, when applied to a small population, tends to be 
 too low therefore it is called  as biased.
 # The correction known as bessel correction is there for small sample 
 size std. deviation. i.e. normalization by (n-1).
 # In electrical-and-electronic-measurements-and-instrumentation by 
 A.K. Sawhney . In 1st chapter of the book Fundamentals of 
 Meausrements  . Its shown that for N=16 the std. deviation 
 normalization was (n-1)=15
 # While I was learning statistics in my course Instructor would advise 
 to take n=20 for normalization by (n-1)
 # Probability and statistics by Schuam Series  is good reading.
 Regards
 ~ymk




Yogesh,

Thanks for the Bessel name, I hadn't come across that before.

The Wikipedia reference for the Bessel Correction uses a divisor of n-1: 
http://en.wikipedia.org/wiki/Bessel%27s_correction

Perhaps the simplification for larger n comes from the fact that for 
large n, 1/n  = 1/(n-1).

I would suggest C. E. Weatherburn - Mathematical Statistics,  but I 
doubt whether it is still widely available.

Colin W.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-05 Thread Sturla Molden
Colin J. Williams skrev:
   
  suggested that 1 (one) would be a better default but Robert Kern told 
 us that it won't happen.

   
I don't even see the need for this keyword argument, as you can always 
multiply the variance by n/(n-1) to get what you want.

Also, normalization by n gives the ML estimate (yes it has a bias, but 
it is better anyway). It is a common novice mistake to use 1/(n-1) as 
nomalization, probably due to poor advice in introductory statistics 
textbooks. It also seems that frequentists are more scared about this 
bias boogey monster than Bayesians. It may actually help beginners to 
avoid this mistake if numpy's implementation prompts them to ask why the 
normalization is 1/n.

If numpy is to change the implementation of std, var, and cov, I suggest 
using the two-pass algorithm to reduce rounding error. (I can provide C 
code.) This is much more important than changing the normalization to a 
bias-free but otherwise inferior value.

Sturla





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-04 Thread Chris Colbert
Why cant the divisor constant just be made an optional kwarg that
defaults to zero?
It wont break any existing code, and will let everybody that wants the
other behavior, to have it.

On Thu, Dec 3, 2009 at 1:49 PM, Colin J. Williams c...@ncf.ca wrote:
 Yogesh,

 Could you explain the rationale for this choice please?

 Colin W.

 On 03-Dec-09 00:35 AM, yogesh karpate wrote:
 The thing is that the normalization by (n-1) is done for the no. of
 samples 20 or23(Not sure about this no. but sure about the thing that
 this no isnt greater than 25) and below that we use normalization by n.
 Regards
 ~ymk



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-04 Thread Pauli Virtanen
pe, 2009-12-04 kello 11:19 +0100, Chris Colbert kirjoitti:
 Why cant the divisor constant just be made an optional kwarg that
 defaults to zero?

It already is an optional kwarg that defaults to zero.

Cheers,
-- 
Pauli Virtanen


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-04 Thread Pauli Virtanen
Thu, 03 Dec 2009 11:05:07 +0530, yogesh karpate wrote:

 The thing is that the normalization by (n-1) is done for the no. of
 samples
20 or23(Not sure about this no. but sure about the thing that this no
isnt
 greater than 25) and below that we use normalization by n. Regards
 ~ymk
 The thing is that the normalization by (n-1) is done for the no. of
 samples gt;20 or23(Not sure about this no. but sure about the thing
 that this no isnt greater than 25) and below that we use normalization
 by n.

Just to clarify: Numpy (of course) does not change the divisor depending 
on `n` -- Yogesh's post concerns probably some code of his own.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-04 Thread yogesh karpate
@ Pauli and @ Colin:
  Sorry for the late reply. I was busy in
some other assignments.
# As far as  normalization by(n) is concerned then its common assumption
that the population is normally distributed and population size is fairly
large enough to fit the normal distribution. But this standard deviation,
when applied to a small population, tends to be too low therefore it is
called  as biased.
# The correction known as bessel correction is there for small sample size
std. deviation. i.e. normalization by (n-1).
# In electrical-and-electronic-measurements-and-instrumentation by A.K.
Sawhney . In 1st chapter of the book Fundamentals of Meausrements  . Its
shown that for N=16 the std. deviation normalization was (n-1)=15
# While I was learning statistics in my course Instructor would advise to
take n=20 for normalization by (n-1)
# Probability and statistics by Schuam Series  is good reading.
Regards
~ymk
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-04 Thread Bruce Southey

On 12/04/2009 06:18 AM, yogesh karpate wrote:

@ Pauli and @ Colin:
  Sorry for the late reply. I was busy 
in some other assignments.
# As far as  normalization by(n) is concerned then its common 
assumption that the population is normally distributed and population 
size is fairly large enough to fit the normal distribution. But this 
standard deviation, when applied to a small population, tends to be 
too low therefore it is called  as biased.
# The correction known as bessel correction is there for small sample 
size std. deviation. i.e. normalization by (n-1).
# In electrical-and-electronic-measurements-and-instrumentation by 
A.K. Sawhney . In 1st chapter of the book Fundamentals of 
Meausrements  . Its shown that for N=16 the std. deviation 
normalization was (n-1)=15
# While I was learning statistics in my course Instructor would advise 
to take n=20 for normalization by (n-1)

# Probability and statistics by Schuam Series  is good reading.
Regards
~ymk


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
   

Hi,
Basically, all that I see with these arbitrary values is that you are 
relying on the 'central limit theorem' 
(http://en.wikipedia.org/wiki/Central_limit_theorem).  Really the issue 
in using these values is how much statistical bias will you tolerate 
especially in the impact on usage of that estimate because the usage of 
variance (such as in statistical tests) tend to be more influenced by 
bias than the estimate of variance. (Of course, many features rely on 
asymptotic properties so bias concerns are less apparent in large sample 
sizes.)


Obviously the default relies on the developers background and 
requirements. There are multiple valid variance estimators in statistics 
with different denominators like N (maximum likelihood estimator), N-1 
(restricted maximum likelihood estimator and certain Bayesian 
estimators) and Stein's 
(http://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator). So 
thecurrent default behavior is a valid and documented. Consequently you 
can not just have one option or different functions (like certain 
programs) and Numpy's implementation actually allows you do all these in 
a single function. So I also see no reason change even if I have to add 
the ddof=1 argument, after all 'Explicit is better than implicit' :-).


Bruce





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-04 Thread Christopher Barker
This is getting OT, as I'm not making any comment on numpy's 
implementation, but...

yogesh karpate wrote:

 # As far as  normalization by(n) is concerned then its common assumption 
 that the population is normally distributed and population size is 
 fairly large enough to fit the normal distribution. But this standard 
 deviation, when applied to a small population, tends to be too low 
 therefore it is called  as biased.

OK.

 # The correction known as bessel correction is there for small sample 
 size std. deviation. i.e. normalization by (n-1).

but why only small size -- the beauty of the approach is that the -1 
makes less and less difference the larger n gets.

  . Its shown that for N=16 the std. deviation normalization was (n-1)=15
 # While I was learning statistics in my course Instructor would advise 
 to take n=20 for normalization by (n-1)

Which introduces an incontinuity -- I never like incontinuities -- why 
bother? for large n, it makes no practical difference, for small n you 
want the -1 -- why arbitrarily decide what small is?

 From an engineering/applied science point of view, I take the view 
expressed in the Wikipedia page on Unbiased estimation of standard 
deviation:

...the task has little relevance to applications of statistics...

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-03 Thread Colin J. Williams
Yogesh,

Could you explain the rationale for this choice please?

Colin W.

On 03-Dec-09 00:35 AM, yogesh karpate wrote:
 The thing is that the normalization by (n-1) is done for the no. of 
 samples 20 or23(Not sure about this no. but sure about the thing that 
 this no isnt greater than 25) and below that we use normalization by n.
 Regards
 ~ymk



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-02 Thread yogesh karpate
The thing is that the normalization by (n-1) is done for the no. of samples
20 or23(Not sure about this no. but sure about the thing that this no isnt
greater than 25) and below that we use normalization by n.
Regards
~ymk
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-11-30 Thread Sturla Molden
Colin J. Williams skrev:
 Where the distribution of a variate is not known a priori, then I 
 believe that it can be shown
 that the n-1 divisor provides the best estimate of the variance.
   
Have you ever been shooting with a rifle?

What would you rather do:

- Hit 9 or 10, with a bias to the right.

- Hit 7 or better, with no bias.

Do you think it can be shown that the latter option is the better?

No?


Sturla Molden
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-11-29 Thread Anne Archibald
2009/11/29 Dr. Phillip M. Feldman pfeld...@verizon.net:

 All of the statistical packages that I am currently using and have used in
 the past (Matlab, Minitab, R, S-plus) calculate standard deviation using the
 sqrt(1/(n-1)) normalization, which gives a result that is unbiased when
 sampling from a normally-distributed population.  NumPy uses the sqrt(1/n)
 normalization.  I'm currently using the following code to calculate standard
 deviations, but would much prefer if this could be fixed in NumPy itself:

This issue was the subject of lengthy discussions on the mailing list,
the upshot of which is that in current versions of scipy, std and var
take an optional argument ddof, into which you can supply 1 to get
the normalization you want.

Anne

 def mystd(x=numpy.array([]), axis=None):
   This function calculates the standard deviation of the input using the
   definition of standard deviation that gives an unbiased result for
 samples
   from a normally-distributed population.
 --
 View this message in context: 
 http://old.nabble.com/non-standard-standard-deviation-tp26566808p26566808.html
 Sent from the Numpy-discussion mailing list archive at Nabble.com.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-11-29 Thread Colin J. Williams


On 29-Nov-09 17:13 PM, Dr. Phillip M. Feldman wrote:
 All of the statistical packages that I am currently using and have used in
 the past (Matlab, Minitab, R, S-plus) calculate standard deviation using the
 sqrt(1/(n-1)) normalization, which gives a result that is unbiased when
 sampling from a normally-distributed population.  NumPy uses the sqrt(1/n)
 normalization.  I'm currently using the following code to calculate standard
 deviations, but would much prefer if this could be fixed in NumPy itself:

 def mystd(x=numpy.array([]), axis=None):
 This function calculates the standard deviation of the input using the
 definition of standard deviation that gives an unbiased result for
 samples
 from a normally-distributed population.

 xd= x - x.mean(axis=axis)
 return sqrt( (xd*xd).sum(axis=axis) / (numpy.size(x,axis=axis)-1.0) )

Anne Archibald has suggested a work-around.  Perhaps ddof could be set, 
by default to
1 as other values are rarely required.

Where the distribution of a variate is not known a priori, then I 
believe that it can be shown
that the n-1 divisor provides the best estimate of the variance.

Colin W.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-11-29 Thread Robin
On Mon, Nov 30, 2009 at 12:30 AM, Colin J. Williams c...@ncf.ca wrote:
 On 29-Nov-09 17:13 PM, Dr. Phillip M. Feldman wrote:
 All of the statistical packages that I am currently using and have used in
 the past (Matlab, Minitab, R, S-plus) calculate standard deviation using the
 sqrt(1/(n-1)) normalization, which gives a result that is unbiased when
 sampling from a normally-distributed population.  NumPy uses the sqrt(1/n)
 normalization.  I'm currently using the following code to calculate standard
 deviations, but would much prefer if this could be fixed in NumPy itself:

 def mystd(x=numpy.array([]), axis=None):
     This function calculates the standard deviation of the input using the
     definition of standard deviation that gives an unbiased result for
 samples
     from a normally-distributed population.

     xd= x - x.mean(axis=axis)
     return sqrt( (xd*xd).sum(axis=axis) / (numpy.size(x,axis=axis)-1.0) )

 Anne Archibald has suggested a work-around.  Perhaps ddof could be set,
 by default to
 1 as other values are rarely required.

 Where the distribution of a variate is not known a priori, then I
 believe that it can be shown
 that the n-1 divisor provides the best estimate of the variance.

There have been previous discussions on this (but I can't find them
now) and I believe the current default was chosen deliberately. I
think it is the view of the numpy developers that the n divisor has
more desireable properties in most cases than the traditional n-1 -
see this paper by Travis Oliphant for details:
http://hdl.handle.net/1877/438

Cheers

Robin
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion