On 01/20/2015 06:32 PM, David G Johnston wrote:
Andrew Dunstan wrote
On 01/20/2015 01:26 PM, Arne Scheffer wrote:
And a very minor aspect:
The term "standard deviation" in your code stands for
(corrected) sample standard deviation, I think,
because you devide by n-1 instead of n to keep the
estimator unbiased.
How about mentioning the prefix "sample"
to indicate this beiing the estimator?

I don't understand. I'm following pretty exactly the calculations stated
at <http://www.johndcook.com/blog/standard_deviation/>


I'm not a statistician. Perhaps others who are more literate in
statistics can comment on this paragraph.
I'm largely in the same boat as Andrew but...

I take it that Arne is referring to:

http://en.wikipedia.org/wiki/Bessel's_correction

but the mere presence of an (n-1) divisor does not mean that is what is
happening.  In this particular situation I believe the (n-1) simply is a
necessary part of the recurrence formula and not any attempt to correct for
sampling bias when estimating a population's variance.  In fact, as far as
the database knows, the values provided to this function do represent an
entire population and such a correction would be unnecessary.  I guess it
boils down to whether "future" queries are considered part of the population
or whether the population changes upon each query being run and thus we are
calculating the ever-changing population variance.  Note point 3 in the
linked Wikipedia article.




Thanks. Still not quite sure what to do, though :-) I guess in the end we want the answer to come up with similar results to the builtin stddev SQL function. I'll try to set up a test program, to see if we do.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to