On 26/02/14 01:40, Lorenzo Isella wrote:
Dear All,
Please consider the snippet at the end of the email.
It is representative of the problems I am experiencing.
I am trying to use glm (without using the formula interface because the
original data is quite large) to model the response in a case where the
predictors are a mix of numbers and factors.
In the end, I always end up with an error message, despite having tried
different choices for the "family" parameter.
Maybe I am missing the obvious, but can anyone run glm with a
combination of numbers and factors?
Any help is appreciated.
Cheers
Lorenzo
###############################################################
set.seed(1234)
x <- rnorm(1000)
dim(x) <- c(100,10)
x <- as.data.frame(x)
names(x) <- LETTERS[seq(10)]
x$J <- round(x$J)
x$J <- as.factor(x$J)
y <- x$A
x <- subset(x, select=-c(A))
model <- glm.fit(x,y## , family=gaussian)
From the help for glm.fit:
For glm.fit: x is a ***design*** matrix of dimension n * p, and y is
a vector of observations of length n.
(Emphasis mine.)
So if you want to/insist on using glm.fit() rather than glm() you will
have construct your own design matrix. I.e. replace
each factor column by k-1 columns of dummy variables (where k is the
number of levels of the given factor). Note that "x" should really be a
*matrix*, not a data frame although it seems that data frames (all of
whose columns are numeric) get coerced to matrices so it doesn't matter
much.
cheers,
Rolf Turner
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.