"Simon, Steve, PhD" wrote:
> 
> The discussion on EDSTAT-L of the regression model by Greg Adams has been
> very interesting. I would suggest that a Poisson regression model might be
> more appropriate here than a simple linear regression model, because the
> dependent variable (the number of votes for Buchanan) is a count. 

        I thought of that too, but it won't work; it's still more dependent on
county size than anything else (and as county size is _inversely_
correlated with proportion of Buchanan voters this is not a good plan.)
Thus there are advantages to using proportions.

        Also, a Poisson model is only directly indicated when the variation is
expected to arise mainly from sampling variation, so that f_Y(y|x) is
Poisson(g(x)). If we take a small range of county sizes (here, between
roughly 100K and 250K) and look at the distribution of Buchanan votes

Midpoint        Count
     100            1  *
     150            0
     200            1  *
     250            1  *
     300            3  ***
     350            0
     400            1  *
     450            0
     500            1  *
     550            4  ****
     600            0
     650            1  *

we see that it's far too dispersed for a Poisson model. Again, a
scatterplot shows that the sqrt transformation that would be appropriate
for a Poisson mode is in fact still slightly undertransforming the data:


         -
       60+                                                 *
         -
 rootBuch-
         -
         -
       40+
         -
         -                                                *
         -                                               *   *
         -                                     ** ** * *      *
       20+                                          *  *
         -                               *22*****  **
         -                   *  *** *    *22 **  *
         -      *     * *24* * * * ** 2
         -      * * 22**3 *
        0+
          
+---------+---------+---------+---------+---------+------logPop  
         7.2       8.4       9.6      10.8      12.0      13.2


same without Palm Beach:


         -                                                *
       30+
         -                                               *   *
 sqrtBuch-                                             *
         -                                     *  ** *        *
         -                                      *      *
       20+                                          *
         -                               *    *    **
         -                                22** **
         -                                **     *
         -                          *     *  **
       10+            *   *  *  *2*** *  * *
         -              **2  *
         -      *   *2**3***   *      *
         -      * * *     *
         -
        0+
          
+---------+---------+---------+---------+---------+------logPop  
         7.2       8.4       9.6      10.8      12.0      13.2
        N* = 1

The sqrt(Buchanan votes)*cuberoot(population) model is not too bad;
however, I would argue that my model (log proportion of Buchanan votes *
log population) fits the regression hypotheses better.  BTW, if a log
transformation is being used, the raw Buchanan votes 
will give an equivalent fit, with equal residuals, as log(BV/pop) =
log(BV) - log(pop). However, given the confusing effect of population
proportion at any other strength of transformation, it is probably
better to think "proportion".

        BTW: under my model, Palm Beach is at the 99.5% prediction level. This
is much less drastic as an outlier than in many other models; but I'd
argue that it's more supportable.

        -Robert Dawson

        

        -Robert Dawson


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to