On 9 Apr 2004 13:11:55 -0700, [EMAIL PROTECTED] (Roger Levy) wrote:

> Richard Ulrich <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>...
> > On 8 Apr 2004 09:40:26 -0700, [EMAIL PROTECTED] (Roger Levy) wrote:
> > 
[snip, much of the original]
RU > > 
> > How many do you have your smaller group?  If you have only
> > (say) 5 cases, you may be lucky to find anything with *one*  
> > variable, even though your total N  is 300.  - And, once the cases
> > are 'predicted' adequately, there is little for your extra variables
> > to do that won't show up as artifacts of over-fitting.
> > If you reach perfect prediction, then your likelihood surface
> > has a hole in it -- Not allowable.  Or, effectively, your predictors
> > can become collinear, when any predictor can substitute for
> > some other one:  That makes a flat likelihood surface, where
> > the SEs  become large because they are measured by the 
> > steepness of the slope.
RL > 
> By 'cases' I presume you mean distinct covariate vectors?  Sorry, I
> should have mentioned this -- the number of covariate vectors is on
> the order of the sample size (i.e., in the hundreds).  So I'm pretty
> sure that overfitting and collinearity are not really issues here
> (since I'm not including any interaction terms in the model).
>

Now you have confused me, a lot.
By 'cases in the smaller group', I am using the common metaphor 
of logistic regression, where the prediction is being made between
cases and non-cases.

With Ordinary Least Squares (OLS) linear regression, you can 
have almost as many Predictors as you have total cases, before you 
get into 'numerical' trouble.  Numerical trouble starts  
earlier for ML logistic regression -- You do not much *power*
for either problem when there is not much 'information' because
the criterion is a dichotomy with only a few in one group.  But
you do not get as much warning with present ML  programs for 
logistic, and they fail earlier.

Further:  What do you mean by distinct covariate vectors? - which
are as numerous as the sample size (which I call, Number of total
cases).  Are you saying, you have as many predictors as the sample N?
THAT  would be overfitting.


 
> > 
> > I don't know why the LogXact method is not affected.
> > I suspect that it is, depending on which variety of SE it 
> > reports.  
> 
> LogXact determines (or samples from, depending on method chosen) the
> exact distribution of the conditional likelihood function and uses it
> to calculate p-values and confidence intervals.  Since it's using the
> exact distribution, it isn't susceptible to small-sample effects
> (similar to the way that, for 2x2 contingency table analysis, Fisher's
> exact test can be used for any distribution of cell counts whereas the
> chi-squared test has sample size limitations).

Well, sure Exact tests can be used.  But there is no power 
when there are far too many hypotheses.  However, I am 
potentially far off base in understanding the data, so I 
don't want to chase this digression.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to