OK. Good point. I have revised the interaction solution (which is now unfortunately not as short) but it is nearly an order of magnitude faster than the other two using 5 columns. It is solution f4, below:
R> set.seed(1) R> mat <- matrix(sample(20,100000,rep=T),nc=5) R> R> f0 <- function(mat) { + mat2 <- apply(mat, 1, paste, collapse=":") + match(mat2, unique(mat2)) + } R> R> f1 <- function(mat) { + z <- apply(mat, 1, paste, collapse=":") + as.numeric(factor(z,levels=unique(z))) + } R> R> f4 <- function(mat) { + z <- apply(mat,2,factor) + as.numeric(interaction(z %*% ((max(z)+1)^(seq(ncol(z))-1)),drop=T)) + } R> R> R> invisible(gc()); system.time(z0 <- f0(mat)) [1] 2.05 0.00 2.17 NA NA R> invisible(gc()); system.time(z1 <- f1(mat)) [1] 2.22 0.01 2.37 NA NA R> invisible(gc()); system.time(z4 <- f4(mat)) [1] 0.26 0.00 0.30 NA NA R> R> all.equal(z0,z1) [1] TRUE R> all.equal(z0,z4) [1] TRUE R> all.equal(z4,z1) [1] TRUE R> R> Liaw, Andy <andy_liaw <at> merck.com> writes: : : The problem with interaction() is that it doesn't scale with increasing : number of columns: : : > set.seed(1) : > mat2 <- matrix(sample(20,5e4,rep=T), 1e4) : > invisible(gc()); system.time(z0 <- f0(mat2)) : [1] 1.58 0.01 1.85 NA NA : > invisible(gc()); system.time(z1 <- f1(mat2)) : [1] 1.57 0.00 1.66 NA NA : > invisible(gc()); system.time(z2 <- f2g(mat2)) : [1] 34.14 0.60 57.45 NA NA : : [f2g is the slightly modified version of f2 to allow for any number of : columns: : f2g <- function(mat) as.numeric(interaction(as.data.frame(mat), drop=T))] : : With 10 columns in the matrix, f0 and f1 ran fine in under 10 seconds, but : f2g started thrashing, and ran out of memory after a while. If you look at : how interaction() is written you'll quickly see why... : : Andy : : > From: Gabor Grothendieck : > : > Waichler, Scott R <Scott.Waichler <at> pnl.gov> writes: : > : > > : > > Thanks to all of you who responded to my help request. : > > Here is the very efficient upshot of your advice: : > > : > > > mat2 <- apply(mat, 1, paste, collapse=":") : > > > vec <- match(mat2, unique(mat2)) : > > > vec : > > [1] 1 2 1 1 2 3 : > > : > > : > > P.S. I found that Andy Liaw's method didn't preserve the : > > index order that I wanted; it yields : > > : > > 2 3 2 2 3 1 : > > : > > To get the order of integers I was looking for required an : > > invocation of unique: : > > : > > as.numeric(factor(apply(mat, 1, paste, collapse=":"), : > > levels=unique(apply(mat, 1, paste, : > collapse=":")))) : > > : > > But the first method above is obviously cleaner and is twice : > > as fast, only 9 seconds for a 100000 row matrix on an ordinary PC. : > : > The interaction solution gives an identical result, is shorter and : > is one or two orders of magnitude faster. Here is a : > comparison of the three: : > : > R> set.seed(1) : > R> mat <- matrix(sample(20,100000,rep=T),50000) : > R> : > R> f0 <- function(mat) { : > + mat2 <- apply(mat, 1, paste, collapse=":"); : > + match(mat2, unique(mat2)) : > + } : > R> : > R> : > R> f1 <- function(mat) { z <- apply(mat, 1, paste, collapse=":") : > + as.numeric(factor(z,levels=unique(z))) : > + } : > R> : > R> f2 <- function(mat) as.numeric(interaction(mat[,1],mat[,2],drop=T)) : > R> : > R> dummy <- gc(); system.time(z0 <- f0(mat)) : > [1] 5.24 0.02 5.52 NA NA : > R> dummy <- gc(); system.time(z1 <- f1(mat)) : > [1] 5.18 0.00 5.52 NA NA : > R> dummy <- gc(); system.time(z2 <- f2(mat)) : > [1] 0.1 0.0 0.1 NA NA : > R> all.equal(z0,z1) : > [1] TRUE : > R> all.equal(z0,z2) : > [1] TRUE : > R> all.equal(z2,z1) : > [1] TRUE : > : > ______________________________________________ : > R-help <at> stat.math.ethz.ch mailing list : > https://www.stat.math.ethz.ch/mailman/listinfo/r-help : > PLEASE do read the posting guide! : > http://www.R-project.org/posting-guide.html : > : > : : ______________________________________________ : R-help <at> stat.math.ethz.ch mailing list : https://www.stat.math.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html : : ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html