Hello,
Is this a bug or a feature?  I am using R 2.7.1 on Apple OS X.


> y <- matrix(1:3,nrow=3)     # y is a single-column matrix
> df <-data.frame(x=1:3,y=y)
> sapply(df,data.class)
       x         y
"numeric" "numeric"
> df$yy <- y
> sapply(df,data.class)
       x         y        yy
"numeric" "numeric"  "matrix"


I'm not sure why dataframes are allowed to have matrices as members. It's also weird to me that y & yy have different classes. It seems like there has been a blurring of the line between lists and dataframes. When did dataframes start taking members other than vectors?

This is an issue if one for example builds a dataframe to fit a model, and then subsequently wants to use predict. You have to work a bit to avoid a type mismatch error.

> df$out = df$x+df$y+df$yy + rnorm(3)
> df
 x y yy       out
1 1 1  1  3.066348
2 2 2  2  5.516017
3 3 3  3 11.073452


> glmout = glm(out~x+y+yy,data=df)
> predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=1:3))
Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric" was supplied
>
> predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=matrix(1:3)))
Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric" was supplied
> predict(glmout,newdata=df[,-4])
       1         2         3
2.548387  6.551939 10.555491
Warning message:
In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type ==  :
 prediction from a rank-deficient fit may be misleading

I'm not really looking for a "solution", as I can already identify several workarounds. I guess I'm mainly trying to figure out what the philosophy is here.

This is also weird to me:

> df$yy <- as.data.frame(y)
> df
 x y V1       out
1 1 1  1  3.066348
2 2 2  2  5.516017
3 3 3  3 11.073452
> glmout = glm(out~x+y+V1,data=df)
Error in eval(expr, envir, enclos) : object "V1" not found
> glmout = glm(out~x+y+yy,data=df)
Error in model.frame.default(formula = out ~ x + y + yy, data = df, drop.unused.levels = TRUE) :
 invalid type (list) for variable 'yy'
> glmout = glm(out~x+y+yy$VI,data=df)
Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df,  :
 invalid type (NULL) for variable 'yy$VI'

Is it impossible to build a model from a dataframe built this way?


thanks, Daryl Morris
(Biostatistics, Univ. of Washington)

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to