Hi John, i tested your example on git version 19e56b3221e1008ad4 and see the following plot:
Which seems to me o.k. If you count in your example all bins, is the sum 10? That is the first problem. That should be fixed with the mentioned commit. In these tests the case values are very close or at the bin limits. So it could be that a case goes in the one or the other bin due to numerical rounding when computing the bin limits. See my comments to your fix below > Am 15.02.2016 um 21:15 schrieb John Darrington <[email protected]>: > > On Mon, Feb 15, 2016 at 06:52:03PM +0000, Friedrich Beckmann wrote: > > I fixed this problem with commit > > > http://git.savannah.gnu.org/cgit/pspp.git/commit/?id=ca4012bcf0f8790ceb8539b55bbc296d0802d5d7 > > Now all cases are considered in the histogram. > > I don't think this is the right fix. > > There will still be a problem in the case where max == adjusted_max > > For example: > > data list list /x *. > begin data. > 1 > 2 > 3 > 4 > 5 > 6 > 7 > 8 > 9 > 10 > end data. > > examine x > /plot = histogram. > > The last bin has 3 items and thus distorts the histogram. > > > I was going to suggest a fix like this: > > From 8e381363c45e8be168d742bcdf2debf17c690ba4 Mon Sep 17 00:00:00 2001 > From: John Darrington <[email protected]> > Date: Mon, 15 Feb 2016 21:05:09 +0100 > Subject: [PATCH] Fix for missing bin > > --- > src/math/histogram.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/src/math/histogram.c b/src/math/histogram.c > index 9158590..c69006b 100644 > --- a/src/math/histogram.c > +++ b/src/math/histogram.c > @@ -143,6 +143,12 @@ histogram_create (double bin_width_in, double min, > double max) > > h = xmalloc (sizeof *h); > > + if (adjusted_max >= max) > + { > + adjusted_max += (adjusted_max - adjusted_min) / bins; > + bins++; > + } > + > h->gsl_hist = gsl_histogram_alloc (bins); This fix would always add a bin because adjusted_max should always be bigger or equal to max. But maybe it is an idea to make sure that adjusted_max is always > max. Then the gsl_histogram binning would always consider the cases where value is max. > > gsl_histogram_set_ranges_uniform (h->gsl_hist, adjusted_min, adjusted_max);
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ pspp-dev mailing list [email protected] https://lists.gnu.org/mailman/listinfo/pspp-dev
