On Feb 6, 2008 8:08 AM, Waterman, DG (David)
<[EMAIL PROTECTED]> wrote:
> Hi,

> I have a data frame consisting of coordinates on a 10*10 grid, i.e.

> > example
>     x  y
> 1   4  5
> 2   6  7
> 3   6  6
> 4   7  5
> 5   5  7
> 6   6  7
> 7   4  5
> 8   6  7
> 9   7  6
> 10  5  6

> What I would like to do is return an 10*10 matrix consisting of counts
> at each position, so in the above example I would have a matrix where,
> for example, cell [4,5] contains 2 and [6,7] contains 3. At the moment I
> have implemented this using a for loop over the rows of the data frame,
> however the data frames I want to process are very long so the loop
> takes many minutes to complete. Can I do this in a more efficient way?

What you are describing is essentially a cross-tabulation so you could use

> examp
   x y
1  4 5
2  6 7
3  6 6
4  7 5
5  5 7
6  6 7
7  4 5
8  6 7
9  7 6
10 5 6
> xtabs(~ x + y, examp)
   y
x   5 6 7
  4 2 0 0
  5 0 1 1
  6 0 1 3
  7 1 1 0

This omits the rows and columns which are completely empty but you can
work around that.

If you have a very large collection of such pairs to summarize you
could consider the version of xtabs in the Matrix package that allows
for the argument sparse = TRUE.  That uses conversion of the "triplet"
form of a sparse matrix to the compressed column for to do the
counting.

If you want to do this without converting the integers in 'x' and 'y'
to factors you can use a distinctly unobvious function like

library(Matrix)
sparsetab <- function(x, y)
{
    x <- as.integer(x)
    y <- as.integer(y)
    stopifnot(length(x) == length(y))
    lx <- length(x)
    mx <- max(x)
    my <- max(y)
    as(new("dgTMatrix", i = x - 1L, j = y - 1L,
           x = rep(1, length(x)), Dim = c(mx, my),
           Dimnames = list(1:mx,1:my)), "dgCMatrix")
}

which produces

> with(examp, sparsetab(x, y))
7 x 7 sparse Matrix of class "dgCMatrix"
  1 2 3 4 5 6 7
1 . . . . . . .
2 . . . . . . .
3 . . . . . . .
4 . . . . 2 . .
5 . . . . . 1 1
6 . . . . . 1 3
7 . . . . 1 1 .

One reason to use such a function instead of xtabs is because xtabs
will convert 'x' and 'y' to factors and the default ordering of the
levels is lexicographic so '11' occurs before '2'.  Again, you can get
around that but the function shown above is more direct and should be
fast enough for most any application.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to