On 8 Apr 2004 09:40:26 -0700, [EMAIL PROTECTED] (Roger Levy) wrote:
> Hi,
>
> I have a question regarding small/sparsely-populated datasets and the
> reliability of statistical inference in using traditional ML
> estimation in logistic regression models.
>
> I'm working with a sample with several hundred observations, with up
> to about a dozen plausible covariates, each of which has discrete
> ordinal values {1,0,-1}. On theoretical grounds (involving the
> problem domain) I believe it's pretty safe not to use interaction
> terms, at least until I've thoroughly investigated first-order models.
> The sample is relatively well-balanced with respect to any individual
> covariate, but obviously it is sparse with respect to the full
> potential covariate space. As I understand it, a sparsely populated
> covariate space does NOT mean that the ML estimates of covariate
> parameters are biased, so for purposes of, say, prediction and overall
> model goodness of fit I can safely use traditional asymptotic ML
> estimation techniques. However, according to my (vague) understanding
> a sparsely-populated covariate space DOES mean that inference
> regarding significance of parameter values and parameter confidence
> intervals by ML techniques WILL be unreliable. Herein is where my
> confusion lies.
>
> From what I've read (e.g., Agresti 1996), the large-sample ML-based
> approximation of standard error for parameter values is based on the
> law of large numbers. For a small covariate space, then, I can see
> why sparsely populated means unreliable SE estimate. If you have a
> sample size of a few hundred, on the other hand, it seems like the law
> of large numbers would apply even if the covariate space is large and
> sparsely populated. Now, I've empirically found with exact logistic
> regression (LogXact) that using only a half-dozen of my covariates I
> get considerable divergence in p-values for some parameters with ML
> versus exact techniques (technically, the network Monte Carlo method
> from Mehta and Patel 2000). But I don't understand the theoretical
> underpinning of why this is happening.
How many do you have your smaller group? If you have only
(say) 5 cases, you may be lucky to find anything with *one*
variable, even though your total N is 300. - And, once the cases
are 'predicted' adequately, there is little for your extra variables
to do that won't show up as artifacts of over-fitting.
If you reach perfect prediction, then your likelihood surface
has a hole in it -- Not allowable. Or, effectively, your predictors
can become collinear, when any predictor can substitute for
some other one: That makes a flat likelihood surface, where
the SEs become large because they are measured by the
steepness of the slope.
I don't know why the LogXact method is not affected.
I suspect that it is, depending on which variety of SE it
reports.
>
> Any explanation would be greatly appreciated! Also, I'd greatly
> appreciate any references to somewhat more technical discussion of the
> large-sample approximation of standard error for asymptotic ML
> techniques.
>
Check further references in your Agresti?
--
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================