It need not be exactly representable as such; take the mean of [1, 1+eps]
for instance. Granted, there are at most two number in the range of the
original dtype which are closest to the true mean; but im not sure that
computing them exactly is a tractable problem for arbitrary input.

Im not sure what is considered best practice for these problems; or if
there is one, considering the hetrogenity of the problem. As noted earlier,
summing a list of floating point values is a remarkably multifaceted
problem, once you get down into the details.

I think it should be understood that all floating point algorithms are
subject to floating point errors. As long as the algorithm used is
specified, one can make an informed decision if the given algorithm will do
what you expect of it. That's the best we can hope for.

If we are going to advocate doing 'clever' things behind the scenes, we
have to take backwards compatibility (not creating a possibility of
producing worse results on the same input) and platform independence in
mind. Funny summation orders could violate the former depending on the
implementation details, and 'using the highest machine precision available'
violates the latter (and is horrible practice in general, imo. Either you
don't need the extra accuracy, or you do, and the absence on a given
platform should be an error)

Perhaps pairwise summation in the original order of the data is the best
option:

q = np.ones((2,)*26, np.float32)
print q.mean()
while q.ndim > 0:
    q = q.mean(axis=-1, dtype=np.float32)
print q

This only requires log(N) space on the stack if properly implemented, and
is not platform dependent, nor should have any backward compatibility
issues that I can think of. But im not sure how easy it would be to
implement, given the current framework. The ability to specify different
algorithms per kwarg wouldn't be a bad idea either, imo; or the ability to
explicitly specify a separate output and accumulator dtype.


On Fri, Jul 25, 2014 at 8:00 PM, Alan G Isaac <alan.is...@gmail.com> wrote:

> On 7/25/2014 1:40 PM, Eelco Hoogendoorn wrote:
> > At the risk of repeating myself: explicit is better than implicit
>
>
> This sounds like an argument for renaming the `mean` function `naivemean`
> rather than `mean`.  Whatever numpy names `mean`, shouldn't it
> implement an algorithm that produces the mean?  And obviously, for any
> float data type, the mean value of the values in the array is representable
> as a value of the same type.
>
> Alan Isaac
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to