For precision loss of the order of float64 eps, I disagree. I was thinking more about precision loss on the order of 1, for large 64-bit integers that can’t fit in a float64
Note also that #10864 <https://github.com/numpy/numpy/issues/10864> incurs deliberate precision loss of the order 10**-6 x smallest bin, which is also much larger than eps. It’s also possible to refer users to scipy.stats.binned_statistic That sounds like a good idea to do irrespective of whether histogramdd has problems - I had no idea those existed. Is there a precedent for referring to more feature-rich scipy functions from the basic numpy ones? On Wed, 25 Apr 2018 at 22:51 Ralf Gommers <ralf.gomm...@gmail.com> wrote: > On Wed, Apr 25, 2018 at 10:07 PM, Eric Wieser <wieser.eric+nu...@gmail.com > > wrote: > >> what does that gain over having the user do something like result.astype() >> >> It means that the user can use integer weights without worrying about >> losing precision due to an intermediate float representation. >> >> It also means they can use higher precision values (np.longdouble) or >> complex weights. >> > None of that seems particularly important to be honest. > > you’re emitting warnings for everyone >> >> When there’s a risk of precision loss, that seems like the responsible >> thing to do. >> > For precision loss of the order of float64 eps, I disagree. There will be > many such places in numpy and in other core libraries. > > >> Users passing float weights would see no warning, I suppose. >> >> is this really worth a new function >> >> There ought to be a function for computing histograms with integer >> weights that doesn’t lose precision. Either we change the existing function >> to do that, or we make a new function. >> > It's also possible to refer users to scipy.stats.binned_statistic(_2d/dd), > which provides a superset of the histogram functionality and is internally > consistent because the implementations of 1d/2d call the dd one. > > Ralf > > > >> A possible compromise: like 1, but only change the dtype of the result if >> a weights argument is passed. >> >> #10864 <https://github.com/numpy/numpy/issues/10864> seems like a >> worrying design flaw too, but I suppose that can be dealt with separately. >> >> Eric >> >> >> On Wed, 25 Apr 2018 at 21:57 Ralf Gommers <ralf.gomm...@gmail.com> wrote: >> >>> On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser < >>> wieser.eric+nu...@gmail.com> wrote: >>> >>>> Numpy has three histogram functions - histogram, histogram2d, and >>>> histogramdd. >>>> >>>> histogram is by far the most widely used, and in the absence of >>>> weights and normalization, returns an np.intp count for each bin. >>>> >>>> histogramdd (for which histogram2d is a wrapper) returns np.float64 in >>>> all circumstances. >>>> >>>> As a contrived comparison >>>> >>>> >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h >>>> array([25, 10, 8, 7], dtype=int64)>>> h, e = np.histogramdd((x*x,), >>>> bins=4); h >>>> array([25., 10., 8., 7.]) >>>> >>>> https://github.com/numpy/numpy/issues/7845 tracks this inconsistency. >>>> >>>> The fix is now trivial: the question is, will changing the return type >>>> break people’s code? >>>> >>>> Either we should: >>>> >>>> 1. Just change it, and hope no one is broken by it >>>> 2. Add a dtype argument: >>>> - If dtype=None, behave like np.histogram >>>> - If dtype is not specified, emit a future warning recommending >>>> to use dtype=None or dtype=float >>>> - In future, change the default to None >>>> 3. Create a new better-named function histogram_nd, which can also >>>> be created without the mistake that is >>>> https://github.com/numpy/numpy/issues/10864. >>>> >>>> Thoughts? >>>> >>> >>> (1) sems like a no-go, taking such risks isn't justified by a minor >>> inconsistency. >>> >>> (2) is still fairly intrusive, you're emitting warnings for everyone and >>> still force people to change their code (and if they don't they may run >>> into a backwards compat break). >>> >>> (3) is the best of these options, however is this really worth a new >>> function? My vote would be "do nothing". >>> >>> Ralf >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion