"Simon, Steve, PhD" wrote:
>
> The discussion on EDSTAT-L of the regression model by Greg Adams has been
> very interesting. I would suggest that a Poisson regression model might be
> more appropriate here than a simple linear regression model, because the
> dependent variable (the number of votes for Buchanan) is a count.
I thought of that too, but it won't work; it's still more dependent on
county size than anything else (and as county size is _inversely_
correlated with proportion of Buchanan voters this is not a good plan.)
Thus there are advantages to using proportions.
Also, a Poisson model is only directly indicated when the variation is
expected to arise mainly from sampling variation, so that f_Y(y|x) is
Poisson(g(x)). If we take a small range of county sizes (here, between
roughly 100K and 250K) and look at the distribution of Buchanan votes
Midpoint Count
100 1 *
150 0
200 1 *
250 1 *
300 3 ***
350 0
400 1 *
450 0
500 1 *
550 4 ****
600 0
650 1 *
we see that it's far too dispersed for a Poisson model. Again, a
scatterplot shows that the sqrt transformation that would be appropriate
for a Poisson mode is in fact still slightly undertransforming the data:
-
60+ *
-
rootBuch-
-
-
40+
-
- *
- * *
- ** ** * * *
20+ * *
- *22***** **
- * *** * *22 ** *
- * * *24* * * * ** 2
- * * 22**3 *
0+
+---------+---------+---------+---------+---------+------logPop
7.2 8.4 9.6 10.8 12.0 13.2
same without Palm Beach:
- *
30+
- * *
sqrtBuch- *
- * ** * *
- * *
20+ *
- * * **
- 22** **
- ** *
- * * **
10+ * * * *2*** * * *
- **2 *
- * *2**3*** * *
- * * * *
-
0+
+---------+---------+---------+---------+---------+------logPop
7.2 8.4 9.6 10.8 12.0 13.2
N* = 1
The sqrt(Buchanan votes)*cuberoot(population) model is not too bad;
however, I would argue that my model (log proportion of Buchanan votes *
log population) fits the regression hypotheses better. BTW, if a log
transformation is being used, the raw Buchanan votes
will give an equivalent fit, with equal residuals, as log(BV/pop) =
log(BV) - log(pop). However, given the confusing effect of population
proportion at any other strength of transformation, it is probably
better to think "proportion".
BTW: under my model, Palm Beach is at the 99.5% prediction level. This
is much less drastic as an outlier than in many other models; but I'd
argue that it's more supportable.
-Robert Dawson
-Robert Dawson
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================