On Feb 6, 2008 8:08 AM, Waterman, DG (David) <[EMAIL PROTECTED]> wrote: > Hi,
> I have a data frame consisting of coordinates on a 10*10 grid, i.e. > > example > x y > 1 4 5 > 2 6 7 > 3 6 6 > 4 7 5 > 5 5 7 > 6 6 7 > 7 4 5 > 8 6 7 > 9 7 6 > 10 5 6 > What I would like to do is return an 10*10 matrix consisting of counts > at each position, so in the above example I would have a matrix where, > for example, cell [4,5] contains 2 and [6,7] contains 3. At the moment I > have implemented this using a for loop over the rows of the data frame, > however the data frames I want to process are very long so the loop > takes many minutes to complete. Can I do this in a more efficient way? What you are describing is essentially a cross-tabulation so you could use > examp x y 1 4 5 2 6 7 3 6 6 4 7 5 5 5 7 6 6 7 7 4 5 8 6 7 9 7 6 10 5 6 > xtabs(~ x + y, examp) y x 5 6 7 4 2 0 0 5 0 1 1 6 0 1 3 7 1 1 0 This omits the rows and columns which are completely empty but you can work around that. If you have a very large collection of such pairs to summarize you could consider the version of xtabs in the Matrix package that allows for the argument sparse = TRUE. That uses conversion of the "triplet" form of a sparse matrix to the compressed column for to do the counting. If you want to do this without converting the integers in 'x' and 'y' to factors you can use a distinctly unobvious function like library(Matrix) sparsetab <- function(x, y) { x <- as.integer(x) y <- as.integer(y) stopifnot(length(x) == length(y)) lx <- length(x) mx <- max(x) my <- max(y) as(new("dgTMatrix", i = x - 1L, j = y - 1L, x = rep(1, length(x)), Dim = c(mx, my), Dimnames = list(1:mx,1:my)), "dgCMatrix") } which produces > with(examp, sparsetab(x, y)) 7 x 7 sparse Matrix of class "dgCMatrix" 1 2 3 4 5 6 7 1 . . . . . . . 2 . . . . . . . 3 . . . . . . . 4 . . . . 2 . . 5 . . . . . 1 1 6 . . . . . 1 3 7 . . . . 1 1 . One reason to use such a function instead of xtabs is because xtabs will convert 'x' and 'y' to factors and the default ordering of the levels is lexicographic so '11' occurs before '2'. Again, you can get around that but the function shown above is more direct and should be fast enough for most any application. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.