Given any regression model, created for instance by lm, lme, lmer, or rqs, such 
as

z1<-lm(weight~poly(Time,2), data=ChickWeight)

I would like a general way to obtain only those variables used for the model.  In the current example, this 
"minimal data frame" would consist of the "weight" and "Time" variables and 
none of the other columns of ChickWeight.

(Motivation: Sometimes the data frame contains thousands of variables which are 
not used in the current regression, and I do not want to keep copying and 
propagating them.)

The "model" component of the regression object doesn't serve this purpose:

head(z1$model)
  weight poly(Time, 2).1 poly(Time, 2).2
1     42    -0.066020938     0.072002235
2     51    -0.053701293     0.031099018
3     59    -0.041381647    -0.001334588
4     64    -0.029062001    -0.025298582
5     76    -0.016742356    -0.040792965
6     93    -0.004422710    -0.047817737

The following awkward workaround seems to do it when variable names contain only 
"word characters" as defined by regex:

minimalvariablesfrommodel20161120 <-function(object, originaldata){
# stopifnot(!missing(originaldata))
stopifnot(!missing(object))
intersect(
        unique(unlist(strsplit(format(object$call$formula), split="\\W", 
perl=TRUE)))
        , names(originaldata)
        )
}

minimalvariablesfrommodel20161120(z1, ChickWeight)
[1] "weight" "Time"


But if a variable has a space in its name, my workaround fails:

ChickWeight$"dog tail"<-ChickWeight$Time
z1<-lm(weight~poly(`dog tail`,2), data=ChickWeight)
head(z1$model)
  weight poly(`dog tail`, 2).1 poly(`dog tail`, 2).2
1     42          -0.066020938           0.072002235
2     51          -0.053701293           0.031099018
3     59          -0.041381647          -0.001334588
4     64          -0.029062001          -0.025298582
5     76          -0.016742356          -0.040792965
6     93          -0.004422710          -0.047817737
minimalvariablesfrommodel20161120(z1, ChickWeight)
[1] "weight"


Is there a more elegant, and hence more reliable, approach?

Thanks

Jacob A. Wegelin
Assistant Professor
C. Kenneth and Dianne Wright Center for Clinical and Translational Research
Department of Biostatistics
Virginia Commonwealth University
830 E. Main St., Seventh Floor
P. O. Box 980032
Richmond VA 23298-0032
U.S.A. URL: http://www.people.vcu.edu/~jwegelin

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to