On 06/06/2010 10:49 PM, Mark Seeto wrote:
Hello,
I have a couple of questions about the ols function in Frank Harrell's rms
package.
Is there any way to specify variables by their column number in the data
frame rather than by the variable name?
For example,
library(rms)
x1<- rnorm(100, 0, 1)
x2<- rnorm(100, 0, 1)
x3<- rnorm(100, 0, 1)
y<- x2 + x3 + rnorm(100, 0, 5)
d<- data.frame(x1, x2, x3, y)
rm(x1, x2, x3, y)
lm(y ~ d[,2] + d[,3], data = d) # This works
ols(y ~ d[,2] + d[,3], data = d) # Gives error
Error in if (!length(fname) || !any(fname == zname)) { :
missing value where TRUE/FALSE needed
However, this works:
ols(y ~ x2 + d[,3], data = d)
The reason I want to do this is to program variable selection for
bootstrap model validation.
A related question: does ols allow "y ~ ." notation?
lm(y ~ ., data = d[, 2:4]) # This works
ols(y ~ ., data = d[, 2:4]) # Gives error
Error in terms.formula(formula) : '.' in formula and no 'data' argument
Thanks for any help you can give.
Regards,
Mark
Hi Mark,
It appears that you answered the questions yourself. rms wants real
variables or transformations of them. It makes certain assumptions
about names of terms. The y ~ . should work though; sometime I'll have
a look at that.
But these are the small questions compared to what you really want. Why
do you need variable selection, i.e., what is wrong with having
insignificant variables in a model? If you indeed need variable
selection see if backwards stepdown works for you. It is built-in to
rms bootstrap validation and calibration functions.
Frank
--
Frank E Harrell Jr Professor and Chairman School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.