On Wed, Apr 25, 2018 at 11:00 PM, Eric Wieser <wieser.eric+nu...@gmail.com> wrote:
> For precision loss of the order of float64 eps, I disagree. > > I was thinking more about precision loss on the order of 1, for large > 64-bit integers that can’t fit in a float64 > It's late and I'm probably missing something, but: >>> np.iinfo(np.int64).max > np.finfo(np.float64).max False Either way, such weights don't really happen in real code I think. > Note also that #10864 <https://github.com/numpy/numpy/issues/10864> > incurs deliberate precision loss of the order 10**-6 x smallest bin, which > is also much larger than eps. > Yeah that's worse. > It’s also possible to refer users to scipy.stats.binned_statistic > > That sounds like a good idea to do irrespective of whether histogramdd has > problems - I had no idea those existed. Is there a precedent for referring > to more feature-rich scipy functions from the basic numpy ones? > Yes, there are cross-links to Python, SciPy and Matplotlib functions in the docs. This is done with intersphinx ( https://github.com/numpy/numpy/blob/master/doc/source/conf.py#L215). Example cross-link for convolve: https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.convolve.html Ralf > > > On Wed, 25 Apr 2018 at 22:51 Ralf Gommers <ralf.gomm...@gmail.com> wrote: > >> On Wed, Apr 25, 2018 at 10:07 PM, Eric Wieser < >> wieser.eric+nu...@gmail.com> wrote: >> >>> what does that gain over having the user do something like >>> result.astype() >>> >>> It means that the user can use integer weights without worrying about >>> losing precision due to an intermediate float representation. >>> >>> It also means they can use higher precision values (np.longdouble) or >>> complex weights. >>> >> None of that seems particularly important to be honest. >> >> you’re emitting warnings for everyone >>> >>> When there’s a risk of precision loss, that seems like the responsible >>> thing to do. >>> >> For precision loss of the order of float64 eps, I disagree. There will be >> many such places in numpy and in other core libraries. >> >> >>> Users passing float weights would see no warning, I suppose. >>> >>> is this really worth a new function >>> >>> There ought to be a function for computing histograms with integer >>> weights that doesn’t lose precision. Either we change the existing function >>> to do that, or we make a new function. >>> >> It's also possible to refer users to scipy.stats.binned_statistic(_2d/dd), >> which provides a superset of the histogram functionality and is internally >> consistent because the implementations of 1d/2d call the dd one. >> >> Ralf >> >> >> >>> A possible compromise: like 1, but only change the dtype of the result >>> if a weights argument is passed. >>> >>> #10864 <https://github.com/numpy/numpy/issues/10864> seems like a >>> worrying design flaw too, but I suppose that can be dealt with separately. >>> >>> Eric >>> >>> >>> On Wed, 25 Apr 2018 at 21:57 Ralf Gommers <ralf.gomm...@gmail.com> >>> wrote: >>> >>>> On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser < >>>> wieser.eric+nu...@gmail.com> wrote: >>>> >>>>> Numpy has three histogram functions - histogram, histogram2d, and >>>>> histogramdd. >>>>> >>>>> histogram is by far the most widely used, and in the absence of >>>>> weights and normalization, returns an np.intp count for each bin. >>>>> >>>>> histogramdd (for which histogram2d is a wrapper) returns np.float64 >>>>> in all circumstances. >>>>> >>>>> As a contrived comparison >>>>> >>>>> >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h >>>>> array([25, 10, 8, 7], dtype=int64)>>> h, e = np.histogramdd((x*x,), >>>>> bins=4); h >>>>> array([25., 10., 8., 7.]) >>>>> >>>>> https://github.com/numpy/numpy/issues/7845 tracks this inconsistency. >>>>> >>>>> The fix is now trivial: the question is, will changing the return type >>>>> break people’s code? >>>>> >>>>> Either we should: >>>>> >>>>> 1. Just change it, and hope no one is broken by it >>>>> 2. Add a dtype argument: >>>>> - If dtype=None, behave like np.histogram >>>>> - If dtype is not specified, emit a future warning recommending >>>>> to use dtype=None or dtype=float >>>>> - In future, change the default to None >>>>> 3. Create a new better-named function histogram_nd, which can also >>>>> be created without the mistake that is https://github.com/numpy/ >>>>> numpy/issues/10864. >>>>> >>>>> Thoughts? >>>>> >>>> >>>> (1) sems like a no-go, taking such risks isn't justified by a minor >>>> inconsistency. >>>> >>>> (2) is still fairly intrusive, you're emitting warnings for everyone >>>> and still force people to change their code (and if they don't they may run >>>> into a backwards compat break). >>>> >>>> (3) is the best of these options, however is this really worth a new >>>> function? My vote would be "do nothing". >>>> >>>> Ralf >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion@python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion