Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
To elaborate on that point; knowing that numpy accumulates in a simple first-to-last sweep, and does not implicitly upcast, the original problem can be solved in several ways; specifying a higher precision to sum with, or by a nested summation, like X.mean(0).mean(0)==1.0. I personally like this explicitness, and am wary of numpy doing overly clever things behind the scenes, as I can think of other code that might become broken if things change too radically. For instance, I often sort large arrays with a large spread in magnitudes before summation, relying on the fact that summing the smallest values first gives best precision. Any changes made to reduction behavior should try and be backwards compatible with such properties of straightforward reductions, or else a lot of code is going to be broken without warning. I suppose using maximum precision internally, and nesting all reductions over multiple axes of an ndarray, are both easy to implement improvements that do not come with any drawbacks that I can think of. Actually the maximum precision I am not so sure of, as I personally prefer to make an informed decision about precision used, and get an error on a platform that does not support the specified precision, rather than obtain subtly or horribly broken results without warning when moving your code to a different platform/compiler whatever. On Fri, Jul 25, 2014 at 5:37 AM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Perhaps it is a slightly semantical discussion; but all fp calculations have errors, and there are always strategies for making them smaller. We just don't happen to like the error for this case; but rest assured it won't be hard to find new cases of 'blatantly wrong' results, no matter what accumulator is implemented. That's no reason to not try and be clever about it, but there isn't going to be an algorithm that is best for all possible inputs, and in the end the most important thing is that the algorithm used is specified in the docs. -- From: Alan G Isaac alan.is...@gmail.com Sent: 25-7-2014 00:10 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays On 7/24/2014 4:42 PM, Eelco Hoogendoorn wrote: This isn't a bug report, but rather a feature request. I'm not sure statement this is correct. The mean of a float32 array can certainly be computed as a float32. Currently this is not necessarily what happens, not even approximately. That feels a lot like a bug, even if we can readily understand how the algorithm currently used produces it. To say whether it is a bug or not, don't we have to ask about the intent of `mean`? If the intent is to sum and divide, then it is not a bug. If the intent is to produce the mean, then it is a bug. Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] change default integer from int32 to int64 on win64?
The dtype returned by np.where looks right (int64): import platform platform.architecture() ('64bit', 'WindowsPE') import numpy as np np.__version__ '1.9.0b1' a = np.zeros(10) np.where(a == 0) (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64),) -- Olivier ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Pandas 0.14.0 Release Candidate 1
How does the build trigger? If its just a matter of clicking on something when released. I think we can handle that :) On Saturday, May 17, 2014 7:22:00 AM UTC-4, Jeff wrote: Hi, I'm pleased to announce the availability of the first release candidate of Pandas 0.14.0. Please try this RC and report any issues here: Pandas Issues https://github.com/pydata/pandas/issues We will be releasing officially in about 2 weeks or so. This is a major release from 0.13.1 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. Highlights include: - Officially support Python 3.4 - SQL interfaces updated to use sqlalchemy, - Display interface changes - MultiIndexing Using Slicers - Ability to join a singly-indexed DataFrame with a multi-indexed DataFrame - More consistency in groupby results and more flexible groupby specifications - Holiday calendars are now supported in CustomBusinessDay - Several improvements in plotting functions, including: hexbin, area and pie plots. - Performance doc section on I/O operations Since there are some significant changes in the default way DataFrames are displayed. I have put up a comment issue looking for some feedback here https://github.com/pydata/pandas/issues/7146 Here are the full whatsnew and documentation links: v0.14.0 Whatsnew http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html v0.14.0 Documentation Page http://pandas-docs.github.io/pandas-docs-travis/ Source tarballs, and windows builds are available here: Pandas v0.14rc1 Release https://github.com/pydata/pandas/releases A big thank you to everyone who contributed to this release! Jeff ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
At 01:22 AM 7/25/2014, you wrote: Actually the maximum precision I am not so sure of, as I personally prefer to make an informed decision about precision used, and get an error on a platform that does not support the specified precision, rather than obtain subtly or horribly broken results without warning when moving your code to a different platform/compiler whatever. We were talking on this in the office, as we realized it does affect a couple of lines dealing with large arrays, including complex64. As I expect Python modules to work uniformly cross platform unless documented otherwise, to me that includes 32 vs 64 bit platforms, implying that the modules should automatically use large enough accumulators for the data type input. http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html does mention inaccuracy. http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.stats.mstats.gmean.html http://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html etc do not, exactly - Ray ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
On Fri, Jul 25, 2014 at 3:11 PM, RayS r...@blue-cove.com wrote: At 01:22 AM 7/25/2014, you wrote: Actually the maximum precision I am not so sure of, as I personally prefer to make an informed decision about precision used, and get an error on a platform that does not support the specified precision, rather than obtain subtly or horribly broken results without warning when moving your code to a different platform/compiler whatever. We were talking on this in the office, as we realized it does affect a couple of lines dealing with large arrays, including complex64. As I expect Python modules to work uniformly cross platform unless documented otherwise, to me that includes 32 vs 64 bit platforms, implying that the modules should automatically use large enough accumulators for the data type input. The 32/64-bitness of your platform has nothing to do with floating point. Nothing discussed in this thread is platform-specific (modulo some minor details about the hardware FPU, but that should be taken as read). -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
At 07:22 AM 7/25/2014, you wrote: We were talking on this in the office, as we realized it does affect a couple of lines dealing with large arrays, including complex64. As I expect Python modules to work uniformly cross platform unless documented otherwise, to me that includes 32 vs 64 bit platforms, implying that the modules should automatically use large enough accumulators for the data type input. The 32/64-bitness of your platform has nothing to do with floating point. As a naive end user, I can, and do, download different binaries for different CPUs/Windows versions and will get different results http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070747.html Nothing discussed in this thread is platform-specific (modulo some minor details about the hardware FPU, but that should be taken as read). And compilers, apparently. The important point was that it would be best if all of the methods affected by summing 32 bit floats with 32 bit accumulators had the same Notes as numpy.mean(). We went through a lot of code yesterday, assuming that any numpy or Scipy.stats functions that use accumulators suffer the same issue, whether noted or not, and found it true. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the http://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html#numpy.dtypedtype keyword can alleviate this issue. seems rather un-Pythonic. - Ray ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
Arguably, the whole of floating point numbers and their related shenanigans is not very pythonic in the first place. The accuracy of the output WILL depend on the input, to some degree or another. At the risk of repeating myself: explicit is better than implicit -Original Message- From: RayS r...@blue-cove.com Sent: 25-7-2014 19:56 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays At 07:22 AM 7/25/2014, you wrote: We were talking on this in the office, as we realized it does affect a couple of lines dealing with large arrays, including complex64. As I expect Python modules to work uniformly cross platform unless documented otherwise, to me that includes 32 vs 64 bit platforms, implying that the modules should automatically use large enough accumulators for the data type input. The 32/64-bitness of your platform has nothing to do with floating point. As a naive end user, I can, and do, download different binaries for different CPUs/Windows versions and will get different results http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070747.html Nothing discussed in this thread is platform-specific (modulo some minor details about the hardware FPU, but that should be taken as read). And compilers, apparently. The important point was that it would be best if all of the methods affected by summing 32 bit floats with 32 bit accumulators had the same Notes as numpy.mean(). We went through a lot of code yesterday, assuming that any numpy or Scipy.stats functions that use accumulators suffer the same issue, whether noted or not, and found it true. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue. seems rather un-Pythonic. - Ray___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
On 7/25/2014 1:40 PM, Eelco Hoogendoorn wrote: At the risk of repeating myself: explicit is better than implicit This sounds like an argument for renaming the `mean` function `naivemean` rather than `mean`. Whatever numpy names `mean`, shouldn't it implement an algorithm that produces the mean? And obviously, for any float data type, the mean value of the values in the array is representable as a value of the same type. Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [pydata] Re: ANN: Pandas 0.14.0 Release Candidate 1
Hi, On Fri, Jul 25, 2014 at 9:52 AM, Jeff jeffreb...@gmail.com wrote: How does the build trigger? If its just a matter of clicking on something when released. I think we can handle that :) The two options are: * I add you and whoever else does releases to my repo, and you can trigger builds by pressing a button on the travis page for my repo, or pushing commits to the repo * You take over the repo, I submit a pull request to make sure you have auth to upload to rackspace, and proceed as above. But yes - single click - build Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
On Fri, Jul 25, 2014 at 5:56 PM, RayS r...@blue-cove.com wrote: The important point was that it would be best if all of the methods affected by summing 32 bit floats with 32 bit accumulators had the same Notes as numpy.mean(). We went through a lot of code yesterday, assuming that any numpy or Scipy.stats functions that use accumulators suffer the same issue, whether noted or not, and found it true. Do you have a list of the functions that are affected? Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue. seems rather un-Pythonic. It's true that in its full generality, this problem just isn't something numpy can solve. Using float32 is extremely dangerous and should not be attempted unless you're prepared to seriously analyze all your code for numeric stability; IME it often runs into problems in practice, in any number of ways. Remember that it only has as much precision as a 24 bit integer. There are good reasons why float64 is the default! That said, it does seem that np.mean could be implemented better than it is, even given float32's inherent limitations. If anyone wants to implement better algorithms for computing the mean, variance, sums, etc., then we would love to add them to numpy. I'd suggest implementing them as gufuncs -- there are examples of defining gufuncs in numpy/linalg/umath_linalg.c.src. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
It need not be exactly representable as such; take the mean of [1, 1+eps] for instance. Granted, there are at most two number in the range of the original dtype which are closest to the true mean; but im not sure that computing them exactly is a tractable problem for arbitrary input. Im not sure what is considered best practice for these problems; or if there is one, considering the hetrogenity of the problem. As noted earlier, summing a list of floating point values is a remarkably multifaceted problem, once you get down into the details. I think it should be understood that all floating point algorithms are subject to floating point errors. As long as the algorithm used is specified, one can make an informed decision if the given algorithm will do what you expect of it. That's the best we can hope for. If we are going to advocate doing 'clever' things behind the scenes, we have to take backwards compatibility (not creating a possibility of producing worse results on the same input) and platform independence in mind. Funny summation orders could violate the former depending on the implementation details, and 'using the highest machine precision available' violates the latter (and is horrible practice in general, imo. Either you don't need the extra accuracy, or you do, and the absence on a given platform should be an error) Perhaps pairwise summation in the original order of the data is the best option: q = np.ones((2,)*26, np.float32) print q.mean() while q.ndim 0: q = q.mean(axis=-1, dtype=np.float32) print q This only requires log(N) space on the stack if properly implemented, and is not platform dependent, nor should have any backward compatibility issues that I can think of. But im not sure how easy it would be to implement, given the current framework. The ability to specify different algorithms per kwarg wouldn't be a bad idea either, imo; or the ability to explicitly specify a separate output and accumulator dtype. On Fri, Jul 25, 2014 at 8:00 PM, Alan G Isaac alan.is...@gmail.com wrote: On 7/25/2014 1:40 PM, Eelco Hoogendoorn wrote: At the risk of repeating myself: explicit is better than implicit This sounds like an argument for renaming the `mean` function `naivemean` rather than `mean`. Whatever numpy names `mean`, shouldn't it implement an algorithm that produces the mean? And obviously, for any float data type, the mean value of the values in the array is representable as a value of the same type. Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
At 11:29 AM 7/25/2014, you wrote: On Fri, Jul 25, 2014 at 5:56 PM, RayS r...@blue-cove.com wrote: The important point was that it would be best if all of the methods affected by summing 32 bit floats with 32 bit accumulators had the same Notes as numpy.mean(). We went through a lot of code yesterday, assuming that any numpy or Scipy.stats functions that use accumulators suffer the same issue, whether noted or not, and found it true. Do you have a list of the functions that are affected? We only tested a few we used, but scipy.stats.nanmean, or any .*mean() numpy.sum, mean, average, std, var,... via something like: import numpy import scipy.stats print numpy.__version__ print scipy.__version__ onez = numpy.ones((2**25, 1), numpy.float32) step = 2**10 func = scipy.stats.nanmean for s in range(2**24-step, 2**25, step): if func(onez[:s+step])!=1.: print '\nbroke', s, func(onez[:s+step]) break else: print '\r',s, That said, it does seem that np.mean could be implemented better than it is, even given float32's inherent limitations. If anyone wants to implement better algorithms for computing the mean, variance, sums, etc., then we would love to add them to numpy. Others have pointed out the possible tradeoffs in summation algos, perhaps a method arg would be appropriate, better depending on your desire for speed vs. accuracy. It just occurred to me that if the STSI folks (who count photons) took the mean() or other such func of an image array from Hubble sensors to find background value, they'd better always be using float64. - Ray ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
On Fri, Jul 25, 2014 at 4:25 PM, RayS r...@blue-cove.com wrote: At 11:29 AM 7/25/2014, you wrote: On Fri, Jul 25, 2014 at 5:56 PM, RayS r...@blue-cove.com wrote: The important point was that it would be best if all of the methods affected by summing 32 bit floats with 32 bit accumulators had the same Notes as numpy.mean(). We went through a lot of code yesterday, assuming that any numpy or Scipy.stats functions that use accumulators suffer the same issue, whether noted or not, and found it true. Do you have a list of the functions that are affected? We only tested a few we used, but scipy.stats.nanmean, or any .*mean() numpy.sum, mean, average, std, var,... via something like: import numpy import scipy.stats print numpy.__version__ print scipy.__version__ onez = numpy.ones((2**25, 1), numpy.float32) step = 2**10 func = scipy.stats.nanmean for s in range(2**24-step, 2**25, step): if func(onez[:s+step])!=1.: print '\nbroke', s, func(onez[:s+step]) break else: print '\r',s, That said, it does seem that np.mean could be implemented better than it is, even given float32's inherent limitations. If anyone wants to implement better algorithms for computing the mean, variance, sums, etc., then we would love to add them to numpy. Others have pointed out the possible tradeoffs in summation algos, perhaps a method arg would be appropriate, better depending on your desire for speed vs. accuracy. I think this would be a good improvement. But it doesn't compensate for users to be aware of the problems. I think the docstring and the description of the dtype argument is pretty clear. I'm largely with Eelco, whatever precision or algorithm we use, with floating point calculations we run into problems in some cases. And I don't think this is a broken function but a design decision that takes the different tradeoffs into account. Whether it's the right decision is an open question, if there are better algorithm with essentially not disadvantages. Two examples: I had problems to verify some results against Stata at more than a few significant digits, until I realized that Stata had used float32 for the calculations by default in this case, while I was working with float64. Using single precision linear algebra causes the same numerical problems as numpy.mean runs into. A few years ago I tried to match some tougher NIST examples that were intentionally very badly scaled. numpy.mean at float64 had quite large errors, but a simple trick with two passes through the data managed to get very close to the certified NIST examples. my conclusion: don't use float32 unless you know you don't need any higher precision. even with float64 it is possible to run into extreme cases where you get numerical garbage or large precision losses. However, in the large majority of cases a boring fast naive implementation is enough. Also, whether we use mean, sum or dot in a calculation is an implementation detail, which in the case of dot doesn't have a dtype argument and always depends on the dtype of the arrays, AFAIK. Separating the accumulation dtype from the array dtype would require a lot of work except in the simplest cases, like those that only use sum and mean with specified dtype argument. trying out the original example: X = np.ones((5, 1024), np.float32) X.mean() 1.0 X.mean(dtype=np.float32) 1.0 np.dot(X.ravel(), np.ones(X.ravel().shape) *1. / X.ravel().shape) 1.02299174 np.__version__ '1.5.1' Win32 Josef It just occurred to me that if the STSI folks (who count photons) took the mean() or other such func of an image array from Hubble sensors to find background value, they'd better always be using float64. - Ray ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
Ray: I'm not working with Hubble data, but yeah these are all issues I've run into with my terrabytes of microscopy data as well. Given that such raw data comes as uint16, its best to do your calculations as much as possible in good old ints. What you compute is what you get, no obscure shenanigans. It just occurred to me that pairwise summation will lead to highly branchy code, and you can forget about any vector extensions. Tradeoffs indeed. Any such hierarchical summation is probably best done by aggregating naively summed blocks. -Original Message- From: RayS r...@blue-cove.com Sent: 25-7-2014 23:26 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays At 11:29 AM 7/25/2014, you wrote: On Fri, Jul 25, 2014 at 5:56 PM, RayS r...@blue-cove.com wrote: The important point was that it would be best if all of the methods affected by summing 32 bit floats with 32 bit accumulators had the same Notes as numpy.mean(). We went through a lot of code yesterday, assuming that any numpy or Scipy.stats functions that use accumulators suffer the same issue, whether noted or not, and found it true. Do you have a list of the functions that are affected? We only tested a few we used, but scipy.stats.nanmean, or any .*mean() numpy.sum, mean, average, std, var,... via something like: import numpy import scipy.stats print numpy.__version__ print scipy.__version__ onez = numpy.ones((2**25, 1), numpy.float32) step = 2**10 func = scipy.stats.nanmean for s in range(2**24-step, 2**25, step): if func(onez[:s+step])!=1.: print '\nbroke', s, func(onez[:s+step]) break else: print '\r',s, That said, it does seem that np.mean could be implemented better than it is, even given float32's inherent limitations. If anyone wants to implement better algorithms for computing the mean, variance, sums, etc., then we would love to add them to numpy. Others have pointed out the possible tradeoffs in summation algos, perhaps a method arg would be appropriate, better depending on your desire for speed vs. accuracy. It just occurred to me that if the STSI folks (who count photons) took the mean() or other such func of an image array from Hubble sensors to find background value, they'd better always be using float64. - Ray ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
On 25.07.2014 23:51, Eelco Hoogendoorn wrote: Ray: I'm not working with Hubble data, but yeah these are all issues I've run into with my terrabytes of microscopy data as well. Given that such raw data comes as uint16, its best to do your calculations as much as possible in good old ints. What you compute is what you get, no obscure shenanigans. integers are dangerous too, they overflow quickly and signed overflow is even undefined in C the standard. It just occurred to me that pairwise summation will lead to highly branchy code, and you can forget about any vector extensions. Tradeoffs indeed. Any such hierarchical summation is probably best done by aggregating naively summed blocks. pairwise summation is usually implemented with a naive sum cutoff large enough so the recursion does not matter much. In numpy 1.9 this cutoff is 128 elements, but the inner loop is unrolled 8 times which makes it effectively 16 elements. the unrolling factor of 8 was intentionally chosen to allow using AVX in the inner loop without changing the summation ordering, but last I tested actually using AVX here only gave mediocre speedups (10%-20% on an i5). From: RayS mailto:r...@blue-cove.com Sent: 25-7-2014 23:26 To: Discussion of Numerical Python mailto:numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays At 11:29 AM 7/25/2014, you wrote: On Fri, Jul 25, 2014 at 5:56 PM, RayS r...@blue-cove.com wrote: The important point was that it would be best if all of the methods affected by summing 32 bit floats with 32 bit accumulators had the same Notes as numpy.mean(). We went through a lot of code yesterday, assuming that any numpy or Scipy.stats functions that use accumulators suffer the same issue, whether noted or not, and found it true. Do you have a list of the functions that are affected? We only tested a few we used, but scipy.stats.nanmean, or any .*mean() numpy.sum, mean, average, std, var,... via something like: import numpy import scipy.stats print numpy.__version__ print scipy.__version__ onez = numpy.ones((2**25, 1), numpy.float32) step = 2**10 func = scipy.stats.nanmean for s in range(2**24-step, 2**25, step): if func(onez[:s+step])!=1.: print '\nbroke', s, func(onez[:s+step]) break else: print '\r',s, That said, it does seem that np.mean could be implemented better than it is, even given float32's inherent limitations. If anyone wants to implement better algorithms for computing the mean, variance, sums, etc., then we would love to add them to numpy. Others have pointed out the possible tradeoffs in summation algos, perhaps a method arg would be appropriate, better depending on your desire for speed vs. accuracy. It just occurred to me that if the STSI folks (who count photons) took the mean() or other such func of an image array from Hubble sensors to find background value, they'd better always be using float64. - Ray ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
At 02:36 PM 7/25/2014, you wrote: But it doesn't compensate for users to be aware of the problems. I think the docstring and the description of the dtype argument is pretty clear. Most of the docs for the affected functions do not have a Note with the same warning as mean() - Ray ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion