It need not be exactly representable as such; take the mean of [1, 1+eps] for instance. Granted, there are at most two number in the range of the original dtype which are closest to the true mean; but im not sure that computing them exactly is a tractable problem for arbitrary input.
Im not sure what is considered best practice for these problems; or if there is one, considering the hetrogenity of the problem. As noted earlier, summing a list of floating point values is a remarkably multifaceted problem, once you get down into the details. I think it should be understood that all floating point algorithms are subject to floating point errors. As long as the algorithm used is specified, one can make an informed decision if the given algorithm will do what you expect of it. That's the best we can hope for. If we are going to advocate doing 'clever' things behind the scenes, we have to take backwards compatibility (not creating a possibility of producing worse results on the same input) and platform independence in mind. Funny summation orders could violate the former depending on the implementation details, and 'using the highest machine precision available' violates the latter (and is horrible practice in general, imo. Either you don't need the extra accuracy, or you do, and the absence on a given platform should be an error) Perhaps pairwise summation in the original order of the data is the best option: q = np.ones((2,)*26, np.float32) print q.mean() while q.ndim > 0: q = q.mean(axis=-1, dtype=np.float32) print q This only requires log(N) space on the stack if properly implemented, and is not platform dependent, nor should have any backward compatibility issues that I can think of. But im not sure how easy it would be to implement, given the current framework. The ability to specify different algorithms per kwarg wouldn't be a bad idea either, imo; or the ability to explicitly specify a separate output and accumulator dtype. On Fri, Jul 25, 2014 at 8:00 PM, Alan G Isaac <alan.is...@gmail.com> wrote: > On 7/25/2014 1:40 PM, Eelco Hoogendoorn wrote: > > At the risk of repeating myself: explicit is better than implicit > > > This sounds like an argument for renaming the `mean` function `naivemean` > rather than `mean`. Whatever numpy names `mean`, shouldn't it > implement an algorithm that produces the mean? And obviously, for any > float data type, the mean value of the values in the array is representable > as a value of the same type. > > Alan Isaac > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion