I tend to agree with Josef here, To me, bincount and digitize are the low-level functions, and histogram contains a bit more functionality since its used so often and for many use cases. My guess is that if we removed the normalization, it could annoy a lot of people and would quickly appear on the desired feature list.
Just to put things in perspective, this was indeed a trivial bug that required a one line fix. It only affected use cases with non-uniform bin widths and normed=True, a combination that is probably uncommon. I believe it is a genuine bug, not just a confusing behavior, and that's why I initially thought a warning was unnecessary. In any case, I'm not sure this is really a "while we're at it" situation, that is, I think the switch from "normed" to "density" should be addressed in another context. That would allow us to include the bug fix (with a warning) in the upcoming 1.5 release. David H. On Mon, Aug 30, 2010 at 11:50 AM, <josef.p...@gmail.com> wrote: > On Mon, Aug 30, 2010 at 11:39 AM, Bruce Southey <bsout...@gmail.com> > wrote: > > On 08/30/2010 09:19 AM, Benjamin Root wrote: > > > > On Mon, Aug 30, 2010 at 8:29 AM, David Huard <david.hu...@gmail.com> > wrote: > >> > >> Thanks for the feedback, > >> As far as I understand it, the proposition is to keep histogram as it is > >> for 1.5, then in 2.0, deprecate normed=True but keep the buggy behavior, > >> while adding a density keyword that fixes the bug. In a later release, > we > >> could then get rid of normed. While the bug won't be present in > histogramdd > >> and histogram2d, the keyword change should be mirrored in those > functions as > >> well. > >> I personally am not too keen on changing the keyword normed for density. > I > >> feel we are trading clarity for a few new users against additional > trouble > >> for many existing users. We could mitigate this by first documenting the > >> change in the docstring and live with both keywords for a few years > before > >> raising a DeprecationWarning. > >> Since this has a direct impact on matloblib's hist, I'd be keen to hears > >> the devs on this. > >> David > > > > I am not a dev, but I would like to give a word of warning from > matplotlib. > > > > In matplotlib, the bar/hist family of functions grew organically as the > devs > > took on various requests to add keywords and such to modify the style and > > behavior of those graphing functions. It has now become an > unmaintainable > > mess, prompting discussions on how to rip it out and replace it with a > > cleaner implementation. While everyone agrees that it needs to be done, > we > > all don't want to break backwards compatibility. > > > > My personal feeling is that a function should do one thing, and do that > one > > thing well. So, to me, that means that histogram() should return an > array > > of counts and the bins for those counts. Anything more is merely window > > dressing to me. With this information, one can easily compute a > cumulative > > distribution function, and/or normalize the result. The idea is that if > > there is nothing special that needs to be done within the histogram > > algorithm to accommodate these extra features, then they belong outside > the > > function. > > > > My 2 cents, > > Ben Root > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > +1 for Ben's approach. > > This is very similar to my view regarding to the contingency table class > > proposed for scipy ( http://projects.scipy.org/scipy/ticket/1258). We > need > > to provide the core functionality that other approaches such as density > > estimation can use but not be limited to specific details. > > I think (a corrected) density histogram is core functionality for > unequal bin lengths. > > The graph with raw count in the case of unequal bin sizes would be > quite misleading when plotted and interpreted on the real line and not > on discrete points (shaded areas instead of vertical lines). And as > the origin of this thread showed, it's not trivial to figure out what > the correct normalization is. > So, I think, if we drop the density normalization, we just need a new > function that does it. > > My 2c, > > Josef > > > > > > Bruce > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion