[issue27181] Add geometric mean to `statistics` module

Mark Dickinson Thu, 09 Jun 2016 01:24:24 -0700

Mark Dickinson added the comment:

Choice of algorithm is a bit tricky here. There are a couple of obvious 
algorithms that work mathematically but result in significant accuracy loss in 
an IEEE 754 floating-point implementation: one is `exp(mean(map(log, 
my_numbers)))`, where the log calls can introduce significant loss of 
information, and the other is `prod(x**(1./len(my_numbers)) for x in 
my_numbers)`, where the `**(1./n)` operation similarly discards information. A 
better algorithm numerically is `prod(my_numbers)**(1./len(my_numbers))`, but 
that's likely to overflow quickly for large datasets (and/or datasets 
containing large values).


I'd suggest something along the lines of 
`prod(my_numbers)**(1./len(my_numbers))`, but keeping track of the exponent of 
the product separately and renormalizing where necessary to avoid overflow.

There are also algorithms for improved accuracy in a product, along the same 
lines as the algorithm used in fsum. See e.g., the paper "Accurate 
Floating-Point Product and Exponentiation" by Stef Graillat. [1] (I didn't know 
about this paper: I just saw a reference to it in a StackOverflow comment [2], 
which reminded me of this issue.)

[1] http://www-pequan.lip6.fr/~graillat/papers/IEEE-TC-prod.pdf
[2] 
http://stackoverflow.com/questions/37715250/safe-computation-of-geometric-mean

----------
nosy: +mark.dickinson

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27181>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27181] Add geometric mean to `statistics` module

Reply via email to