2008/4/8, Bruce Southey <[EMAIL PROTECTED]>: > > Hi, > I agree that the current histogram should be changed. However, I am not > sure 1.0.5 is the correct release for that.
We both agree. David, this doesn't work for your code: > r= np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]) > dbin=[2,3,4] > rc, rb=histogram(r, bins=dbin, discard=None) Returns: > rc=[3 3] # Really should be [3, 3, 9] > rb=[-9223372036854775808 3 -9223372036854775808] I used the convention that bins are the bin edges, including the right most edge, this is why len(rc) =2 and len(rb)=3. Now there clearly is a bug, and I traced it to the use of np.r_. Check this out: In [26]: dbin = [1,2,3] In [27]: np.r_[-np.inf, dbin, np.inf] Out[27]: array([-Inf, 1., 2., 3., Inf]) In [28]: np.r_[-np.inf, asarray(dbin), np.inf] Out[28]: array([-9223372036854775808, 1, 2, 3, -9223372036854775808]) In [29]: np.r_[-np.inf, asarray(dbin).astype(float), np.inf] Out[29]: array([-Inf, 1., 2., 3., Inf]) Is this a misuse of r_ or a bug ? David But I have not had time to find the error. > > Regards > Bruce > > > > David Huard wrote: > > Hans, > > > > Note that the current histogram is buggy, in the sense that it assumes > > that all bins have the same width and computes db = bins[1]-bin[0]. > > This is why you get zeros everywhere. > > > > The current behavior has been heavily criticized and I think we should > > change it. My proposal is to have for histogram the same behavior as > > for histogramdd and histogram2d: bins are the bin edges, including the > > rightmost bin, and values outside of the bins are not tallied. The > > problem with this is that it breaks code, and I'm not sure it's such a > > good idea to do this in a point release. > > > > My short term proposal would be to fix the normalization bug and > > document the current behavior of histogram for the 1.0.5 release. Once > > it's done, we can modify histogram and maybe print a warning the first > > time it's used to notice users of the change. > > > > I'd like to hear the voice of experienced devs on this. This issue has > > been raised a number of times since I follow this ML. It's not the > > first time I've proposed patches, and I've already documented the > > weird behavior only to see the comments disappear after a while. I > > hope this time some kind of agreement will be reached. > > > > Regards, > > > > David > > > > > > > > > > 2008/4/8, Hans Meine <[EMAIL PROTECTED] > > > <mailto:[EMAIL PROTECTED]>>: > > > > > Am Montag, 07. April 2008 14:34:08 schrieb Hans Meine: > > > > > Am Samstag, 05. April 2008 21:54:27 schrieb Anne Archibald: > > > > There's also a fourth option - raise an exception if any > > points are > > > > outside the range. > > > > > > +1 > > > > > > I think this should be the default. Otherwise, I tend towards > > "exclude", > > > in order to have comparable bin sizes (when plotting, I always > > find peaks > > > at the ends annoying); this could also be called "clip" BTW. > > > > > > But really, an exception would follow the Zen: "In the face of > > ambiguity, > > > refuse the temptation to guess." And with a kwarg: "Explicit is > > better > > > than implicit." > > > > > > When posting this, I did indeed not think this through fully; as > > David (and > > Tommy) pointed out, this API does not fit well with the existing > > `bins` > > option, especially when a sequence of bin bounds is given. (I > > guess I was > > mostly thinking about the special case of discrete values and 1:1 > > bins, as > > typical for uint8 data.) > > > > Thus, I would like to withdraw my above opinion from and instead > > state that I > > find the current API as clear as it gets. If you want to exclude > > values, > > simply pass an additional right bound, and for including outliers, > > passing -inf as additional left bound seems to do the trick. This > > could be > > possibly added to the documentation though. > > > > The only critical aspect I see is the `normed` arg. As it is now, > the > > rightmost bin has always infinite size, but it is not treated like > > that: > > > > In [1]: from numpy import * > > > > In [2]: histogram(arange(10), [2,3,4], normed = True) > > Out[2]: (array([ 0.1, 0.1, 0.6]), array([2, 3, 4])) > > > > Even worse, if you try to add an infinite bin to the left, this > > pulls all > > values to zero (technically, I understand that, but it looks really > > undesirable to me): > > > > In [3]: histogram(arange(10), [-inf, 2,3,4], normed = True) > > Out[3]: (array([ 0., 0., 0., 0.]), array([-Inf, 2., 3., > 4.])) > > > > > > -- > > Ciao, / / > > /--/ > > / / ANS > > _______________________________________________ > > Numpy-discussion mailing list > > > Numpy-discussion@scipy.org <mailto:Numpy-discussion@scipy.org> > > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > ------------------------------------------------------------------------ > > > > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion@scipy.org > > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion