The problem with interaction() is that it doesn't scale with increasing number of columns:
> set.seed(1) > mat2 <- matrix(sample(20,5e4,rep=T), 1e4) > invisible(gc()); system.time(z0 <- f0(mat2)) [1] 1.58 0.01 1.85 NA NA > invisible(gc()); system.time(z1 <- f1(mat2)) [1] 1.57 0.00 1.66 NA NA > invisible(gc()); system.time(z2 <- f2g(mat2)) [1] 34.14 0.60 57.45 NA NA [f2g is the slightly modified version of f2 to allow for any number of columns: f2g <- function(mat) as.numeric(interaction(as.data.frame(mat), drop=T))] With 10 columns in the matrix, f0 and f1 ran fine in under 10 seconds, but f2g started thrashing, and ran out of memory after a while. If you look at how interaction() is written you'll quickly see why... Andy > From: Gabor Grothendieck > > Waichler, Scott R <Scott.Waichler <at> pnl.gov> writes: > > > > > Thanks to all of you who responded to my help request. > > Here is the very efficient upshot of your advice: > > > > > mat2 <- apply(mat, 1, paste, collapse=":") > > > vec <- match(mat2, unique(mat2)) > > > vec > > [1] 1 2 1 1 2 3 > > > > > > P.S. I found that Andy Liaw's method didn't preserve the > > index order that I wanted; it yields > > > > 2 3 2 2 3 1 > > > > To get the order of integers I was looking for required an > > invocation of unique: > > > > as.numeric(factor(apply(mat, 1, paste, collapse=":"), > > levels=unique(apply(mat, 1, paste, > collapse=":")))) > > > > But the first method above is obviously cleaner and is twice > > as fast, only 9 seconds for a 100000 row matrix on an ordinary PC. > > The interaction solution gives an identical result, is shorter and > is one or two orders of magnitude faster. Here is a > comparison of the three: > > R> set.seed(1) > R> mat <- matrix(sample(20,100000,rep=T),50000) > R> > R> f0 <- function(mat) { > + mat2 <- apply(mat, 1, paste, collapse=":"); > + match(mat2, unique(mat2)) > + } > R> > R> > R> f1 <- function(mat) { z <- apply(mat, 1, paste, collapse=":") > + as.numeric(factor(z,levels=unique(z))) > + } > R> > R> f2 <- function(mat) as.numeric(interaction(mat[,1],mat[,2],drop=T)) > R> > R> dummy <- gc(); system.time(z0 <- f0(mat)) > [1] 5.24 0.02 5.52 NA NA > R> dummy <- gc(); system.time(z1 <- f1(mat)) > [1] 5.18 0.00 5.52 NA NA > R> dummy <- gc(); system.time(z2 <- f2(mat)) > [1] 0.1 0.0 0.1 NA NA > R> all.equal(z0,z1) > [1] TRUE > R> all.equal(z0,z2) > [1] TRUE > R> all.equal(z2,z1) > [1] TRUE > > ______________________________________________ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html