[Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-05 Thread Bruce Southey
Hi, I have been investigating Ticket #605 'Incorrect behavior of numpy.histogram' (http://scipy.org/scipy/numpy/ticket/605 ). The fix for this ticket really depends on what the expectations are for the bin limits and different applications have different behavior. Consequently, I think that feedba

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-05 Thread James Philbin
The matlab behaviour is to extend the first bin to include all data down to -inf and extend the last bin to handle all data to inf. This is probably the behaviour with least suprise. Therefor, I would vote +1 for behaviour #1 by default, +1 for keeping the old behaviour #2 around as an option and

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-05 Thread Anne Archibald
On 05/04/2008, Bruce Southey <[EMAIL PROTECTED]> wrote: > 1) Should the first bin contain all values less than or equal to the > value of the first limit and the last bin contain all values greater > than the value of the last limit? > This produced the counts as: array([3, 3, 9]) (I termed th

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-06 Thread Tommy Grav
On Apr 5, 2008, at 2:01 PM, Bruce Southey wrote: > Hi, > I have been investigating Ticket #605 'Incorrect behavior of > numpy.histogram' (http://scipy.org/scipy/numpy/ticket/605 ). I think that my preference depends on the definition of what the bin number means. If the bin numbers are the lower

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-07 Thread Hans Meine
Am Samstag, 05. April 2008 21:54:27 schrieb Anne Archibald: > There's also a fourth option - raise an exception if any points are > outside the range. +1 I think this should be the default. Otherwise, I tend towards "exclude", in order to have comparable bin sizes (when plotting, I always find

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-07 Thread David Huard
+1 for an outlier keyword. Note, that this implies that when bins are passed explicitly, the edges are given (nbins+1), not simply the left edges (nbins). While we are refactoring histogram, I'd suggest adding an axis keyword. This is pretty straightforward to implement using the np.apply_along_ax

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-07 Thread Bruce Southey
Hi, Thanks David for pointing the piece of information I forgot to add in my original email. -1 for 'raise an exception' because, as Dan points out, the problem stems from user providing bins. +1 for the outliers keyword. Should 'exclude' distinguish points that are too low and those that are too

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-07 Thread LB
+1 for axis and +1 for a keyword to define what to do with values outside the range. For the keyword, ather than 'outliers', I would propose 'discard' or 'exclude', because it could be used to describe the four possibilities : - discard='low' => values lower than the range are discarded, va

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-07 Thread Tommy Grav
On Apr 7, 2008, at 4:14 PM, LB wrote: > +1 for axis and +1 for a keyword to define what to do with values > outside the range. > > For the keyword, ather than 'outliers', I would propose 'discard' or > 'exclude', because it could be used to describe the four > possibilities : > - discard='low'

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-07 Thread David Huard
> On Apr 7, 2008, at 4:14 PM, LB wrote: > > +1 for axis and +1 for a keyword to define what to do with values > > outside the range. > > > > For the keyword, ather than 'outliers', I would propose 'discard' or > > 'exclude', because it could be used to describe the four > > possibilities : > > - d

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-08 Thread Hans Meine
Am Montag, 07. April 2008 14:34:08 schrieb Hans Meine: > Am Samstag, 05. April 2008 21:54:27 schrieb Anne Archibald: > > There's also a fourth option - raise an exception if any points are > > outside the range. > > +1 > > I think this should be the default. Otherwise, I tend towards "exclude", >

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-08 Thread David Huard
Hans, Note that the current histogram is buggy, in the sense that it assumes that all bins have the same width and computes db = bins[1]-bin[0]. This is why you get zeros everywhere. The current behavior has been heavily criticized and I think we should change it. My proposal is to have for histo

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-08 Thread Bruce Southey
Hi, I agree that the current histogram should be changed. However, I am not sure 1.0.5 is the correct release for that. David, this doesn't work for your code: r= np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]) dbin=[2,3,4] rc, rb=histogram(r, bins=dbin, discard=None) Returns: rc=[3 3] # Really should

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-08 Thread David Huard
2008/4/8, Bruce Southey <[EMAIL PROTECTED]>: > > Hi, > I agree that the current histogram should be changed. However, I am not > sure 1.0.5 is the correct release for that. We both agree. David, this doesn't work for your code: > r= np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]) > dbin=[2,3,4] > rc,

Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-09 Thread Bruce Southey
Hi, I should have asked first (I hope that you don't mind), but I created a ticket Ticket #728 (http://scipy.org/scipy/numpy/ticket/728 ) for numpy.r_ because this incorrectly casts based on the array types. The bug is that -inf and inf are numpy floats but dbin is an array of ints. Unfortunate