On Mon, Aug 30, 2010 at 3:02 PM, <josef.p...@gmail.com> wrote: > On Mon, Aug 30, 2010 at 2:43 PM, Benjamin Root <ben.r...@ou.edu> wrote: > > On Mon, Aug 30, 2010 at 10:50 AM, <josef.p...@gmail.com> wrote: > >> > >> On Mon, Aug 30, 2010 at 11:39 AM, Bruce Southey <bsout...@gmail.com> > >> wrote: > >> > On 08/30/2010 09:19 AM, Benjamin Root wrote: > >> > > >> > On Mon, Aug 30, 2010 at 8:29 AM, David Huard <david.hu...@gmail.com> > >> > wrote: > >> >> > >> >> Thanks for the feedback, > >> >> As far as I understand it, the proposition is to keep histogram as it > >> >> is > >> >> for 1.5, then in 2.0, deprecate normed=True but keep the buggy > >> >> behavior, > >> >> while adding a density keyword that fixes the bug. In a later > release, > >> >> we > >> >> could then get rid of normed. While the bug won't be present in > >> >> histogramdd > >> >> and histogram2d, the keyword change should be mirrored in those > >> >> functions as > >> >> well. > >> >> I personally am not too keen on changing the keyword normed for > >> >> density. I > >> >> feel we are trading clarity for a few new users against additional > >> >> trouble > >> >> for many existing users. We could mitigate this by first documenting > >> >> the > >> >> change in the docstring and live with both keywords for a few years > >> >> before > >> >> raising a DeprecationWarning. > >> >> Since this has a direct impact on matloblib's hist, I'd be keen to > >> >> hears > >> >> the devs on this. > >> >> David > >> > > >> > I am not a dev, but I would like to give a word of warning from > >> > matplotlib. > >> > > >> > In matplotlib, the bar/hist family of functions grew organically as > the > >> > devs > >> > took on various requests to add keywords and such to modify the style > >> > and > >> > behavior of those graphing functions. It has now become an > >> > unmaintainable > >> > mess, prompting discussions on how to rip it out and replace it with a > >> > cleaner implementation. While everyone agrees that it needs to be > done, > >> > we > >> > all don't want to break backwards compatibility. > >> > > >> > My personal feeling is that a function should do one thing, and do > that > >> > one > >> > thing well. So, to me, that means that histogram() should return an > >> > array > >> > of counts and the bins for those counts. Anything more is merely > window > >> > dressing to me. With this information, one can easily compute a > >> > cumulative > >> > distribution function, and/or normalize the result. The idea is that > if > >> > there is nothing special that needs to be done within the histogram > >> > algorithm to accommodate these extra features, then they belong > outside > >> > the > >> > function. > >> > > >> > My 2 cents, > >> > Ben Root > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion@scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> > +1 for Ben's approach. > >> > This is very similar to my view regarding to the contingency table > class > >> > proposed for scipy ( http://projects.scipy.org/scipy/ticket/1258). We > >> > need > >> > to provide the core functionality that other approaches such as > density > >> > estimation can use but not be limited to specific details. > >> > >> I think (a corrected) density histogram is core functionality for > >> unequal bin lengths. > >> > >> The graph with raw count in the case of unequal bin sizes would be > >> quite misleading when plotted and interpreted on the real line and not > >> on discrete points (shaded areas instead of vertical lines). And as > >> the origin of this thread showed, it's not trivial to figure out what > >> the correct normalization is. > >> So, I think, if we drop the density normalization, we just need a new > >> function that does it. > >> > >> My 2c, > >> > >> Josef > >> > >> > > > > Why not a function that takes the output of a core histogram and produces > a > > correct density normalization? Such a function would be useful > elsewhere, I > > imagine. > > > > Of course there is a lot of legacy issues to consider, but if we > introduce > > such a function first with documentation in histogram() showing how to > > produce a normalized density, we can then keep some of the bad code for > now > > for backwards compatibility with notes saying that some of the stuff will > be > > deprecated. Especially point out in the docs where the current code > fails > > to produce the correct results. > > bugfix or redesign ? > > My feature request for (or target for forking) the histogram functions > is to get the temporary results out, or get additional results, for > example the bin-number or quantization for each observation, or some > other things that I don't remember right now. > > With histogram functions that only do histograms, we loose a lot of > calculations. This is, however, not really relevant for calculating > densities since the bin edges are returned. > > Not sure I'm understanding what you mean by this, but if you look at the code, you'll see that histogram is basically a big wrapper around a one-liner: np.diff(np.searchsorted(np.sort(data), bins)). Most of the code is there to make this one-liner user-friendly, improve performance or handle weights.
I just added a warning alerting concerned users (r8674), so this takes care of the bug fix and Nils wish to avoid a silent change in behavior. These two changes could be included in 1.5 if Ralf feels this is worthwhile. Cheers, David H. > Josef > > > > > > Ben Root > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion