on 07/29/2008 09:51 AM [EMAIL PROTECTED] wrote:
Hi list,
is there a package or function to compute the frequencies of pairs of
chars in a variable across a grouping variable? Eg:
d <- data.frame(ID=gl(2,3), F=c("A","B","C","A","C","D"))
d
ID F 1 1 A 2 1 B 3 1 C 4 2 A 5 2 C 6 2 D
Now I want to summarize the frequencies of all pairs A-B, A-C, A-D,
B-C, B-D, C-D across ID:
A B C D A - 1 2 1 B - - 1 0 C - - - 1
here, the combination A-C is most frequent. The real problem behind
that is that 'F' codes diagnoses and I search for the most often
pairs of diagnoses.
Thanks, Sven
I suspect that there might be something over in Bioconductor, but here
is one approach:
> table(data.frame(t(do.call(cbind,
tapply(d$F, d$ID,
function(x) combn(as.character(x), 2))))))
X2
X1 B C D
A 1 2 1
B 0 1 0
C 0 0 1
See ?combn to create the initial pairs from the data. This is done on a
per ID basis using tapply. The result is transposed into a data frame
and then table() is used to create the cross tabulation of the results.
HTH,
Marc Schwartz
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.