On 26/02/14 01:40, Lorenzo Isella wrote:
Dear All,
Please consider the snippet at the end of the email.
It is representative of the problems I am experiencing.
I am trying to use glm (without using the formula interface because the
original data is quite large) to model the response in a case where the
predictors are a mix of numbers and factors.
In the end, I always end up with an error message, despite having tried
different choices for the "family" parameter.
Maybe I am missing the obvious, but can anyone run glm with a
combination of numbers and factors?
Any help is appreciated.
Cheers

Lorenzo




###############################################################
set.seed(1234)

x <- rnorm(1000)
dim(x) <- c(100,10)
x <- as.data.frame(x)
names(x) <- LETTERS[seq(10)]

x$J <- round(x$J)

x$J <- as.factor(x$J)

y <- x$A
x <- subset(x, select=-c(A))

model <- glm.fit(x,y## , family=gaussian)

From the help for glm.fit:

For glm.fit: x is a ***design*** matrix of dimension n * p, and y is
a vector of observations of length n.

(Emphasis mine.)

So if you want to/insist on using glm.fit() rather than glm() you will have construct your own design matrix. I.e. replace each factor column by k-1 columns of dummy variables (where k is the number of levels of the given factor). Note that "x" should really be a *matrix*, not a data frame although it seems that data frames (all of whose columns are numeric) get coerced to matrices so it doesn't matter much.

cheers,

Rolf Turner

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to