[R] Comparing histograms?

2010-05-13 Thread Jonathan Greenberg
Rhelpers:

I'm curious what the appropriate analysis to use for testing the
hypothesis that two histograms are statistically different from one
another?  Thanks!

--j

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing histograms?

2010-05-13 Thread Ted Harding
On 13-May-10 16:00:51, Jonathan Greenberg wrote:
 Rhelpers:
 I'm curious what the appropriate analysis to use for testing the
 hypothesis that two histograms are statistically different from one
 another?  Thanks!
 
 --j

That's potentially several questions in one, and each question
might have more than one answer!

If you have two histograms with the same number (k) of bins,
then you could think of it as a two-way (2xk) table, and use
a chi-squared or likelihood-ratio test. But then you may well
have to take account of what reasoning went into the choice
of breaks to define the bins.

If the two histograms have different bumbers of bins (and
possibly different break-points), then that approach would
not work. You could consider fitting a distribution (of the
same type, e.g. Normal) to each, and to them jointly, and
then doing a likelihood ratio test. Basically on the lines of

Given
Hist1: m1.1 out of M in (a0,a1), m1.2 out of M in (a1,a2),
   ... m1.k1 out of M in (a[k1-1],ak1)

Hist2: n11 out of N in (b0,b1), n12 out of N in (b1,b2),
   ... n1.k2 out of N in (b[k2-1],ak2)

Then:
A: Fit a normal distribution by maximum likelihood (grouped
data) to the joint Hist1 and Hist2 data (same mean mu and
variance V for each); Watch out that you need to condition
on M = N.

B: Fit a Normal distribution to Hist 1 (mean mu1, variance V1)
to the Hist1 data. Fit one (mean mu2m variance V2) to Hist2.

In fit B you have fitted 4 parameters, in fit A you have fitted 2.
Hence 2*(log likelihood from B - log likelihood from A) will
have a chi-squared distribution with 4-2 = 2 degrees of freedom.

You might want to impose that V1 = V2 (or not).

Similar for distributions of other types that you might consider
fitting to the data.

And so on!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 13-May-10   Time: 20:14:51
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.