2008/8/12 Joe Harrington <[EMAIL PROTECTED]>: > So, I endorse extending min() and all other statistical routines to > handle NaNs, possibly with a switch to turn it on if a suitably fast > algorithm cannot be found (which is competitor IDL's solution). > Certainly without a switch the default behavior should be to return > NaN, not to return some random value, if a NaN is present. Otherwise > the user may never know a NaN is present, and therefore has to check > every use for NaNs. That constand manual NaN checking is slower and > more error-prone than any numerical speed advantage. > > So to sum, proposed for statistical routnes: > if NaN is not present, return value > if NaN is present, return NaN > if NaN is present and nan=True, return value ignoring all NaNs > > OR: > if NaN is not present, return value > if NaN is present, return value ignoring all NaNs > if NaN is present and nan=True, return NaN > > I'd prefer the latter. IDL does the former and it is a pain to do > /nan all the time. However, the latter might trip up the unwary, > whereas the former never does. > > This would apply at least to: > min > max > sum > prod > mean > median > std > and possibly many others.
For almost all of these the current behaviour is to propagate NaNs arithmetically. For example, the sum of anything with a NaN is NaN. I think this is perfectly sufficient, given how easy it is to strip out NaNs if that's what you want. The issue that started this thread (and the many other threads that have come up as users stub their toes on this behaviour) is that min (and other functions based on comparisons) do not propagate NaNs. If you do np.amin(A) and A contains NaNs, you can't count on getting a NaN back, unlike np.mean or np.std. the fact that you get some random value not the minimum just adds insult to injury. (It is probably also true that the value you get back depends on how the array is stored in memory.) It really isn't very hard to replace np.sum(A) with np.sum(A[~isnan(A)]) if you want to ignore NaNs instead of propagating them. So I don't feel a need for special code in sum() that treats NaN as 0. I would be content if the comparison-based functions propagated NaNs appropriately. If you did decide it was essential to make versions of the functions that removed NaNs, it would get you most of the way there to add an optional keyword argument to ufuncs' reduce method that skipped NaNs. Anne _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion