On Sun, 6 Feb 2000, Milo Schield wrote:

> QUESTION:  What is the theoretical maximum value of R-sq ** when binary 
> data (Y) is obtained from a simple linear model?

Not clear what "obtained from a simple linear model" means.  Are you 
using a model to _generate_ values of Y?  Or are you using such a model 
to _represent_ a relationship between X and Y in "real" data?
        But, for openers, if you're looking at data that are binary in Y 
and continuous in X, I would expect max R-sq to depend on (a) the 
proportion of Y's that are at one value (or the other;  symmetric about 
0.5) and (b) the degree of separation between values of X for one value 
of Y and values of X for the other value of Y.
        (I picture something like the following, WLOG choosing Y = {0.1} 
for convenience:

        Y=1 |                            * * *  *** * *  *
            |
            |
        Y=0 | * *  *** * **  *
            |
            +-----------------------------------------------
                                X

The larger the horizontal gap between max (X: Y=0) and min (X: Y=1), the 
greater the value of R-sq, ceteris paribus.  Since you have specified 
that you want the _maximum_ R-sq, we can rule out the situations in which 
the values of X overlap between the groups defined by Y.  You have not, 
however, specified how the conditional distributions of X are to differ 
from each other.)

> The data is binary with Y values taken from a linear model going from 0 
> to 1 over the range of X.

Can you specify what model you have in mind, and how Y values are "taken" 
from it?

> The binary sequences of Y values are organized to minimize* the 
> standard deviation around the model.
> 
> TYPE REGRESSION            DISTRIBUTION OF X VALUES
> a.     OLS                   linear
> b.     OLS                   normal [width truncated at 6 sigma?]
> c.     Logistic              linear
> d.     Logistic              normal [width truncated at 6 sigma?]

        O.K., stop a minute.  I think I know what a normal distribution 
is, arbitrarily truncated or not;  but what is a "linear" distribution? 
Do you perchance mean "uniform" or "rectangular"?  Over what range? 

> Based on some discrete trials, I get the following estimates for R-sq:
> a.  99%
> b.  16%
> c.  96%
> d.  16%

I don't see how this can make sense without rather more detail (agreeing, 
as I often do, with Rich Ulrich).  "Some discrete trials"??  (Not, 
doubtless, to be confused with indiscreet trials, nor presumably with 
continuous trials;  but it's still a puzzle what these trials might be.)

> * On the selection of binary Y values.  Suppose the X values are linearly
> distributed from 0 to 1 and the Model is Y=X.  In the discrete case with 100
> points, the first 5 would be all zeros and the last 5 would be all ones.  

Umm...  I'm sorry, but I don't see why, with 100 points (presumably 
equally spaced?  Is that part of what you meant by "linearly distributed"?), 
the first 50 (not just 5) wouldn't be all zeroes and the last 50 all ones.

> At the center, half the points would be zeroes and the other half would 
> be ones.

At the center of what?  If you mean at the center of the 100 points, that 
would presumably comprise 2 points (100 being even;  if it were 101 
points, the center would comprise one point).  What are all these 
"points" (and how many of them are there?) half of which would be zeroes 
etc.?  Are we to understand that the half that are zero are randomly zero 
in some sense of "random"?  Why?

 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-471-7128  



===========================================================================
  This list is open to everyone. Occasionally, people lacking respect
  for other members of the list send messages that are inappropriate
  or unrelated to the list's discussion topics. Please just delete the
  offensive email.

  For information concerning the list, please see the following web page:
  http://jse.stat.ncsu.edu/
===========================================================================

Reply via email to