On 10 Apr 2004 09:37:16 -0700, [EMAIL PROTECTED] (Roger Levy) wrote: [snip, earlier posts of his and mine] me > > Now you have confused me, a lot. > > By 'cases in the smaller group', I am using the common metaphor > > of logistic regression, where the prediction is being made between > > cases and non-cases. RL > > Ah, I think I misunderstood you. I'm not familiar with the > cases/non-cases terminology of logistic regression -- could you > explain this usage?
I will explain by way of providing this extract from a useful reference, which includes the point I was making - from http://www2.chass.ncsu.edu/garson/pa765/logistic.htm [ after a number of pages ] "How many independents can I have? "There is no precise answer to this question, but the more independents, the more likelihood of multicollinearity. Also, if you have 20 independents, at the .05 level of significance you would expect one to be found to be significant just by chance. A rule of thumb is that there should be no more than 1 independent for each 10 cases in the sample. In applying this rule of thumb, keep in mind that if there are categorical independents, such as dichotomies, the number of cases should be considered to be the lesser of the groups (ex., in a dichotomy with 500 0's and 10 1's, effective size would be 10). " ---- end of extract from Garson. You might find the whole document interesting to scan. [snip, more of mine] > > By a "distinct covariate vector" I mean the following: with n > covariates (i.e., predictors) X_1,...,X_n, a covariate vector is a > value [x_1,...,x_n] for a given data point. So, for example, if I > have a half-dozen binary covariates, there are 2^6=64 logically > possible covariate vectors. Now I wonder what computer program you are using. What you describe was once a concern for the packages, about 20 years. I remember a program that wanted me to sort my cases into order, so that each 'possible covariate vector' (as you say) would be contiguous so the program could form the actual groups. I do not *think* that is a concern any more, for modern packages, even though I find an ambiguous reference to your concern in the Garson document I cited. > > Each of my covariates is three-valued. So the situation for which ML > and exact logistic regression were giving me substantially different > results was with a half-dozen covariates, i.e. 3^6=729 possible > covariate vectors, and 300 datapoints, therefore the covariate space > was sparsely populated. I was not including any interaction terms, > and in most cases each datapoint had a unique set of predictor values, > so there were only seven parameters in my model and overfitting is > almost certainly not an issue. > > So to restate my confusion, what I don't understand is the technical > reason why asymptotic ML estimates for parameter confidence intervals > and p-values would be unreliable in such a situation, since sample > size is relatively large in absolute terms. Well, for one thing, there are two different versions of the p-values that are available these days. You want to look at the tests that are defined by subtraction, rather than the Wald test: If you have an old program, it might only feature the Wald, which is the ratio of the coefficient divided by the ASE. See Garson for details and commentary. As an alternative step, diagnosing your whole dataset and problem, I suggest that you perform a regression with 0/1 criterion or do two-group discriminant function. Those OLS programs are mathematically the same as each other, and give practically identical tests to logistic, for most data with Ns in the hundreds. They are more robust than logistic against overfitting, and also give better diagnostics if that is any threat. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
