Re: Optimal bin size for fitting histogram to normal pdf?

Greg Heath Wed, 15 Aug 2001 19:54:21 -0700
Date: Tue, 14 AUG 2001 16:27:11 +1000
From: Hong Ooi <[EMAIL PROTECTED]>

> On 13 Aug 2001 18:59:10 -0700, [EMAIL PROTECTED] (David
> Goldsmith) wrote:
> 
> >Aloha!  I'm fitting theoretically normally distributed data, of widely
> >differing sample sizes, to Gaussians by histograming it and then using an
> >"off-the-shelf", third-party IDL routine.  Obviously, the "goodness" of
> >fit, as measured by the mse, is some function of the bin size used to
> >create the histogram.  Some numerical experiments I've run using IDL's
> >pseudo-normal-random number generator and "sample" sizes from 10^2 to
> >10^6.5 indicate that the "best" (that which minimizes the mse) bin size
> >(expressed as a multiple of the sample standard deviation) vs. log(sample
> >size) function is oscillatory, non-periodic.  I was hoping for
> >monotonicity so that I could create either a formula or at least a table
> >for this function; not having that, I used the observation that the values
> >seem to be bounded by 0.25 and 0.3 sigma, and, despite it being below any
> >actually observed value, chose 0.25 for "psychological" reasons.  
> >Unfortunately, this choice is not working uniformly well, (which actually
> >is not surprising given that the observed "good" range is about 20% of
> >this value).  My question for these groups is, does anyone know of any
> >theoretical results on this topic?  Thanks,
> 
> A standard result is that in terms of minimising MISE, the optimal binwidth
> for a histogram is O(n^{-1/3}), where n is the sample size. For normally
> distributed data, the formula is 3.491 x sigma x n^{-1/3}.

Use your favorite search engines to look up Sturges rule.

> That said, if you know your data really is normally distributed, why do you
> need to fit a histogram anyway? The sample mean and variance give you the
> best possible estimate of the true density, without any need to use
> histograms or smoothers.

If you are trying to develop a technique to estimate confidence levels for 
a Gaussian hypothesis test, forget histograms. Sort the samples, 
determine the emperical CDF and apply a statistical test like K-S or 
Anderson-Darling.

Greg (Brown '62)

Hope this helps.

Gregory E. Heath     [EMAIL PROTECTED]      The views expressed here are
M.I.T. Lincoln Lab   (781) 981-2815        not necessarily shared by
Lexington, MA        (781) 981-0908(FAX)   M.I.T./LL or its sponsors
02420-9185, USA
 


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================
Re: Optimal bin size for fitting histogram to normal pdf?

Reply via email to