Date: Tue, 14 AUG 2001 16:27:11 +1000
From: Hong Ooi <[EMAIL PROTECTED]>
> On 13 Aug 2001 18:59:10 -0700, [EMAIL PROTECTED] (David
> Goldsmith) wrote:
>
> >Aloha! I'm fitting theoretically normally distributed data, of widely
> >differing sample sizes, to Gaussians by histograming it and then using an
> >"off-the-shelf", third-party IDL routine. Obviously, the "goodness" of
> >fit, as measured by the mse, is some function of the bin size used to
> >create the histogram. Some numerical experiments I've run using IDL's
> >pseudo-normal-random number generator and "sample" sizes from 10^2 to
> >10^6.5 indicate that the "best" (that which minimizes the mse) bin size
> >(expressed as a multiple of the sample standard deviation) vs. log(sample
> >size) function is oscillatory, non-periodic. I was hoping for
> >monotonicity so that I could create either a formula or at least a table
> >for this function; not having that, I used the observation that the values
> >seem to be bounded by 0.25 and 0.3 sigma, and, despite it being below any
> >actually observed value, chose 0.25 for "psychological" reasons.
> >Unfortunately, this choice is not working uniformly well, (which actually
> >is not surprising given that the observed "good" range is about 20% of
> >this value). My question for these groups is, does anyone know of any
> >theoretical results on this topic? Thanks,
>
> A standard result is that in terms of minimising MISE, the optimal binwidth
> for a histogram is O(n^{-1/3}), where n is the sample size. For normally
> distributed data, the formula is 3.491 x sigma x n^{-1/3}.
Use your favorite search engines to look up Sturges rule.
> That said, if you know your data really is normally distributed, why do you
> need to fit a histogram anyway? The sample mean and variance give you the
> best possible estimate of the true density, without any need to use
> histograms or smoothers.
If you are trying to develop a technique to estimate confidence levels for
a Gaussian hypothesis test, forget histograms. Sort the samples,
determine the emperical CDF and apply a statistical test like K-S or
Anderson-Darling.
Greg (Brown '62)
Hope this helps.
Gregory E. Heath [EMAIL PROTECTED] The views expressed here are
M.I.T. Lincoln Lab (781) 981-2815 not necessarily shared by
Lexington, MA (781) 981-0908(FAX) M.I.T./LL or its sponsors
02420-9185, USA
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================