On Jun 13, 2010, at 10:20 PM, array chip wrote:

> Hi, this is not R technical question per se. I know there are many excellent 
> statisticians in this list, so here my questions: I have dataset with ~1800 
> observations and 50 independent variables, so there are about 35 samples per 
> variable. Is it wise to build a stable multiple logistic model with 50 
> independent variables? Any problem with this approach? Thanks
> 
> John


The general rule of thumb is to have 10-20 'events' per covariate degree of 
freedom. Frank has suggested that in some cases that number should be as high 
as 25. 

The number of events is the smaller of the two possible outcomes for your 
binary dependent variable.

Covariate degrees of freedom refers to the number of columns in the model 
matrix. Continuous variables are 1, binary factors are 1, K-level factors are K 
- 1.

So if out of your 1800 records, you have at least 500 to 1000 events, depending 
upon how many of your 50 variables are K-level factors and whether or not you 
need to consider interactions, you may be OK. Better if towards the high end of 
that range, especially if the model is for prediction versus explanation.

Two excellent references would be Frank's book:

  
http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/

and Steyerberg's book:

  
http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/

to assist in providing guidance for model building/validation techniques.

HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to