On Wed, Nov 26, 2008 at 8:14 AM, jim holtman <[EMAIL PROTECTED]> wrote: > Your time is being taken up in cor.test because you are calling it > 100,000 times. So grin and bear it with the amount of work you are > asking it to do. > > Here I am only calling it 100 time: > >> m1 <- matrix(rnorm(10000), ncol=100) >> m2 <- matrix(rnorm(10000), ncol=100) >> Rprof('/tempxx.txt') >> system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, >> function(y) { cor.test(x,y)$p.value }) })) > user system elapsed > 8.86 0.00 8.89 >> > > so my guess is that calling it 100,000 times will take: 100,000 * > 0.0886 seconds or about 3 hours.
You can make it ~3 times faster by vectorising the testing: m1 <- matrix(rnorm(10000), ncol=100) m2 <- matrix(rnorm(10000), ncol=100) system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value })})) system.time({ r <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })}) df <- nrow(m1) - 2 t <- sqrt(df) * r / sqrt(1 - r ^ 2) p <- pt(t, df) p <- 2 * pmin(p, 1 - p) }) all.equal(cor.pvalues, p) You can make cor much faster by stripping away all the error checking code and calling the internal c function directly (suggested by the Rprof output): system.time({ r <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })}) }) system.time({ r2 <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { .Internal(cor(x, y, 4L, FALSE)) })}) }) 1.5s vs 0.2 s on my computer. Combining both changes gives me a ~25 time speed up - I suspect you can do even better if you think about what calculations are being duplicated in the computation of the correlations. Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.