[R] manipulating data via the factors of a term in a lm()

Michael Friendly Thu, 16 Apr 2009 09:05:50 -0700

[Env: R 2.8.1, Win XP]

For a package I'm working on, I need two small helper functions tomanipulate thedata used in an lm or mlm object, given the *name* of a term, which willalways be

a character string representing

a factor ("A") or an interaction of two or more factors ("A:B", "A:B:C",...). I'dprefer not to have to require() additional packages if possible. Thefunctions I need

are

[1] termMeans(mod, term): Find the (possibly vector) response means fora model term, specified by a string, "A", "A:B", ...[2] dataIndices(mod, term): Find indices of the observations in the datacorresponding to the combination of levels in the term


Here is a small example to illustrate what I need: a 3-factor mlm

factors <- expand.grid(A=factor(1:3),B=factor(1:2),C=factor(1:2))
n <- nrow(factors)
responses <-data.frame(Y1=10+round(10*rnorm(n)),Y2=10+round(10*rnorm(n)))
test <- data.frame(factors, responses)
mod <- lm(cbind(Y1,Y2) ~ A*B, data=test)

[1] Find the means for a model term, specified by a string, "A", "A:B","A:C", ...

Similar to:

> # main effect means
>  aggregate(responses,by=list(test$A), mean)
 Group.1    Y1    Y2
1       1 12.25  9.50
2       2 10.75  6.25
3       3 11.75 11.00
4       4 10.25  4.25

>> #interaction means

>  aggregate(responses,by=list(test$A,test$B), mean)
 Group.1 Group.2   Y1   Y2
1       1       1 16.0  5.5
2       2       1 13.5  8.0
3       3       1 16.0  7.5
4       4       1 10.0 10.5
5       1       2  8.5 13.5
6       2       2  8.0  4.5
7       3       2  7.5 14.5
8       4       2 10.5 -2.0

>

Here is what I've tried so far:

# find means for a model term in an lm or mlm object
termMeans <- function(mod, term){
   data <- model.frame(mod)
   Y <- model.response(data)
   factors <- data[, sapply(data, is.factor), drop=FALSE]
   term.factors <- strsplit(term, ":")
   means <- aggregate(Y, by=factors[,term.factors], mean)
}

but I'm missing something. Maybe there is an easier path using apply()or by()?


> termMeans(mod, "A")
Error in .subset(x, j) : invalid subscript type 'list'

[2] For some plots, I need to be able to assign symbol characteristics(col, pch, etc) by indexing a vector

in the form of arguments
    col=col[index], pch=pch[index], ...

where index is a vector of integers 1:# levels in a model term. Thus,for my example, with the data 'test'


> test
  A B C  Y1 Y2
1  1 1 1  22  6
2  2 1 1  22  2
3  3 1 1  16 11
4  4 1 1   2 13
5  1 2 1   0 19
6  2 2 1  -2 -7
7  3 2 1   5  8
8  4 2 1  37 -9
9  1 1 2  10  5
10 2 1 2   5 14
11 3 1 2  16  4
12 4 1 2  18  8
13 1 2 2  17  8
14 2 2 2  18 16
15 3 2 2  10 21
16 4 2 2 -16  5


I need a similar function dataIndex(mod, term):

dataIndex(mod,"A")
--> same as

> test$A
[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

dataIndex(mod,"A:B") should deliver

[1] 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

# find sequential indices for observations in a data frame
# corresponding to the unique combinations of the levels
# of a given model term

dataIndex <- function(mod, term){
   data <- model.frame(mod)
   Y <- model.response(data)
   factors <- data[, sapply(data, is.factor), drop=FALSE]
   term.factors <- strsplit(term, ":")
   ???
}


thanks,
-Michael


--

Michael Friendly Email: friendly AT yorku DOT caProfessor, Psychology Dept.

York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] manipulating data via the factors of a term in a lm()

Reply via email to