Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
Your time is being taken up in cor.test because you are calling it 100,000 times. So grin and bear it with the amount of work you are asking it to do. Here I am only calling it 100 time: m1 - matrix(rnorm(1), ncol=100) m2 - matrix(rnorm(1), ncol=100) Rprof('/tempxx.txt') system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) })) user system elapsed 8.860.008.89 so my guess is that calling it 100,000 times will take: 100,000 * 0.0886 seconds or about 3 hours. If you run Rprof, you will see if is spending most of its time there: 0 8.8 root 1.8.8 apply 2. .8.8 FUN 3. . .8.8 apply 4. . . .8.7 FUN 5. . . . .8.6 cor.test 6. . . . . .8.4 cor.test.default 7. . . . . . .2.4 match.arg 8. . . . . . . .1.7 eval 9. . . . . . . . .1.4 deparse 10. . . . . . . . . .0.6 .deparseOpts 11. . . . . . . . . . .0.2 pmatch 11. . . . . . . . . . .0.1 sum 10. . . . . . . . . .0.5 %in% 11. . . . . . . . . . .0.3 match 12. . . . . . . . . . . .0.3 is.factor 13. . . . . . . . . . . . .0.3 inherits 8. . . . . . . .0.2 formals 9. . . . . . . . .0.2 sys.function 7. . . . . . .2.1 cor 8. . . . . . . .1.1 match.arg 9. . . . . . . . .0.7 eval 10. . . . . . . . . .0.6 deparse 11. . . . . . . . . . .0.3 .deparseOpts 12. . . . . . . . . . . .0.1 pmatch 11. . . . . . . . . . .0.2 %in% 12. . . . . . . . . . . .0.2 match 13. . . . . . . . . . . . .0.1 is.factor 14. . . . . . . . . . . . . .0.1 inherits 9. . . . . . . . .0.1 formals 8. . . . . . . .0.5 stopifnot 9. . . . . . . . .0.2 match.call 8. . . . . . . .0.1 pmatch 8. . . . . . . .0.1 is.data.frame 9. . . . . . . . .0.1 inherits 7. . . . . . .1.5 paste 8. . . . . . . .1.4 deparse 9. . . . . . . . .0.6 .deparseOpts 10. . . . . . . . . .0.3 pmatch 10. . . . . . . . . .0.1 any 9. . . . . . . . .0.6 %in% 10. . . . . . . . . .0.6 match 11. . . . . . . . . . .0.5 is.factor 12. . . . . . . . . . . .0.4 inherits 13. . . . . . . . . . . . .0.2 mode 7. . . . . . .0.4 switch 8. . . . . . . .0.1 qnorm 7. . . . . . .0.2 pt 5. . . . .0.1 $ On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan [EMAIL PROTECTED] wrote: My two matrices are roughly the sizes of m1 and m2. I tried using two apply and cor.test to compute the correlation p.values. More than an hour, and the codes are still running. Please help to make it more efficient. m1 - matrix(rnorm(10), ncol=100) m2 - matrix(rnorm(1000), ncol=100) cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) }) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
He might try rcorr from Hmisc instead. Using your test suite, it gives about a 20% improvement on my MacPro: m1 - matrix(rnorm(1), ncol=100) m2 - matrix(rnorm(1), ncol=100) Rprof('/tempxx.txt') system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { rcorr(x,y)$P }) })) user system elapsed 4.221 0.049 4.289 m1 - matrix(rnorm(1), ncol=100) m2 - matrix(rnorm(1), ncol=100) Rprof('/tempxx.txt') system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) })) user system elapsed 5.328 0.038 5.355 I'm not a smart enough programmer to figure out whether there might be an even more efficient method that takes advantage rcorr's implicit looping through a set of columns to produce an all combinations return. -- David Winsemius, MD Heritage Labs On Nov 26, 2008, at 9:14 AM, jim holtman wrote: Your time is being taken up in cor.test because you are calling it 100,000 times. So grin and bear it with the amount of work you are asking it to do. Here I am only calling it 100 time: m1 - matrix(rnorm(1), ncol=100) m2 - matrix(rnorm(1), ncol=100) Rprof('/tempxx.txt') system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) })) user system elapsed 8.860.008.89 so my guess is that calling it 100,000 times will take: 100,000 * 0.0886 seconds or about 3 hours. If you run Rprof, you will see if is spending most of its time there: 0 8.8 root 1.8.8 apply 2. .8.8 FUN 3. . .8.8 apply 4. . . .8.7 FUN 5. . . . .8.6 cor.test 6. . . . . .8.4 cor.test.default 7. . . . . . .2.4 match.arg 8. . . . . . . .1.7 eval 9. . . . . . . . .1.4 deparse 10. . . . . . . . . .0.6 .deparseOpts 11. . . . . . . . . . .0.2 pmatch 11. . . . . . . . . . .0.1 sum 10. . . . . . . . . .0.5 %in% 11. . . . . . . . . . .0.3 match 12. . . . . . . . . . . .0.3 is.factor 13. . . . . . . . . . . . .0.3 inherits 8. . . . . . . .0.2 formals 9. . . . . . . . .0.2 sys.function 7. . . . . . .2.1 cor 8. . . . . . . .1.1 match.arg 9. . . . . . . . .0.7 eval 10. . . . . . . . . .0.6 deparse 11. . . . . . . . . . .0.3 .deparseOpts 12. . . . . . . . . . . .0.1 pmatch 11. . . . . . . . . . .0.2 %in% 12. . . . . . . . . . . .0.2 match 13. . . . . . . . . . . . .0.1 is.factor 14. . . . . . . . . . . . . .0.1 inherits 9. . . . . . . . .0.1 formals 8. . . . . . . .0.5 stopifnot 9. . . . . . . . .0.2 match.call 8. . . . . . . .0.1 pmatch 8. . . . . . . .0.1 is.data.frame 9. . . . . . . . .0.1 inherits 7. . . . . . .1.5 paste 8. . . . . . . .1.4 deparse 9. . . . . . . . .0.6 .deparseOpts 10. . . . . . . . . .0.3 pmatch 10. . . . . . . . . .0.1 any 9. . . . . . . . .0.6 %in% 10. . . . . . . . . .0.6 match 11. . . . . . . . . . .0.5 is.factor 12. . . . . . . . . . . .0.4 inherits 13. . . . . . . . . . . . .0.2 mode 7. . . . . . .0.4 switch 8. . . . . . . .0.1 qnorm 7. . . . . . .0.2 pt 5. . . . .0.1 $ On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan [EMAIL PROTECTED] wrote: My two matrices are roughly the sizes of m1 and m2. I tried using two apply and cor.test to compute the correlation p.values. More than an hour, and the codes are still running. Please help to make it more efficient. m1 - matrix(rnorm(10), ncol=100) m2 - matrix(rnorm(1000), ncol=100) cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) }) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
Hi Daren, Here is another aproach a little bit faster taking into account that I'm using your original matrices. My session info is at the end. I'm using a 2.4 GHz Core 2-Duo processor and 3 GB of RAM. # Data set.seed(123) m1 - matrix(rnorm(10), ncol=100) m2 - matrix(rnorm(10), ncol=100) colnames(m1)=paste('m1_',1:100,sep=) colnames(m2)=paste('m2_',1:100,sep=) # Combinations combs=expand.grid(colnames(m1),colnames(m2)) # --- # Option 1 # system.time(apply(combs,1,function(x) cor.test(m1[,x[1]],m2[,x[2]])$p.value)-pvalues1) # user system elapsed # 8.120.018.20 # --- # Option 2 # require(Hmisc) system.time(apply(combs,1,function(x) rcorr(m1[,x[1]],m2[,x[2]])$P[2])-pvalues2) # user system elapsed # 7.000.007.02 HTH, Jorge # - Session Info R version 2.8.0 Patched (2008-11-08 r46864) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan [EMAIL PROTECTED] wrote: My two matrices are roughly the sizes of m1 and m2. I tried using two apply and cor.test to compute the correlation p.values. More than an hour, and the codes are still running. Please help to make it more efficient. m1 - matrix(rnorm(10), ncol=100) m2 - matrix(rnorm(1000), ncol=100) cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) }) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
On Wed, Nov 26, 2008 at 8:14 AM, jim holtman [EMAIL PROTECTED] wrote: Your time is being taken up in cor.test because you are calling it 100,000 times. So grin and bear it with the amount of work you are asking it to do. Here I am only calling it 100 time: m1 - matrix(rnorm(1), ncol=100) m2 - matrix(rnorm(1), ncol=100) Rprof('/tempxx.txt') system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) })) user system elapsed 8.860.008.89 so my guess is that calling it 100,000 times will take: 100,000 * 0.0886 seconds or about 3 hours. You can make it ~3 times faster by vectorising the testing: m1 - matrix(rnorm(1), ncol=100) m2 - matrix(rnorm(1), ncol=100) system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value })})) system.time({ r - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })}) df - nrow(m1) - 2 t - sqrt(df) * r / sqrt(1 - r ^ 2) p - pt(t, df) p - 2 * pmin(p, 1 - p) }) all.equal(cor.pvalues, p) You can make cor much faster by stripping away all the error checking code and calling the internal c function directly (suggested by the Rprof output): system.time({ r - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })}) }) system.time({ r2 - apply(m1, 1, function(x) { apply(m2, 1, function(y) { .Internal(cor(x, y, 4L, FALSE)) })}) }) 1.5s vs 0.2 s on my computer. Combining both changes gives me a ~25 time speed up - I suspect you can do even better if you think about what calculations are being duplicated in the computation of the correlations. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
Out of desperation, I made the following function which hadley beats me to it :P. Thanks everyone for the great help. cor.p.values - function(r, n) { df - n - 2 STATISTIC - c(sqrt(df) * r / sqrt(1 - r^2)) p - pt(STATISTIC, df) return(2 * pmin(p, 1 - p)) } Date: Wed, 26 Nov 2008 09:33:59 -0600 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices CC: [EMAIL PROTECTED]; [EMAIL PROTECTED] On Wed, Nov 26, 2008 at 8:14 AM, jim holtman wrote: Your time is being taken up in cor.test because you are calling it 100,000 times. So grin and bear it with the amount of work you are asking it to do. Here I am only calling it 100 time: m1 - matrix(rnorm(1), ncol=100) m2 - matrix(rnorm(1), ncol=100) Rprof('/tempxx.txt') system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) })) user system elapsed 8.86 0.00 8.89 so my guess is that calling it 100,000 times will take: 100,000 * 0.0886 seconds or about 3 hours. You can make it ~3 times faster by vectorising the testing: m1 - matrix(rnorm(1), ncol=100) m2 - matrix(rnorm(1), ncol=100) system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value })})) system.time({ r - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })}) df - nrow(m1) - 2 t - sqrt(df) * r / sqrt(1 - r ^ 2) p - pt(t, df) p - 2 * pmin(p, 1 - p) }) all.equal(cor.pvalues, p) You can make cor much faster by stripping away all the error checking code and calling the internal c function directly (suggested by the Rprof output): system.time({ r - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })}) }) system.time({ r2 - apply(m1, 1, function(x) { apply(m2, 1, function(y) { .Internal(cor(x, y, 4L, FALSE)) })}) }) 1.5s vs 0.2 s on my computer. Combining both changes gives me a ~25 time speed up - I suspect you can do even better if you think about what calculations are being duplicated in the computation of the correlations. Hadley -- http://had.co.nz/ _ [[elided Hotmail spam]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
You can do much better by doing the correlations as a matrix operation: system.time({ + m1-scale(m1) + m2-scale(m2) + r-crossprod(m1,m2)/100 + df-100 + tstat-sqrt(df)*r/sqrt(1-r^2) + p-pt(tstat,df) + }) user system elapsed 0.025 0.004 0.028 There might be a factor of n/(n-1) missing somewhere, which would be fixable if you could bring yourself to care about it. -thomas On Wed, 26 Nov 2008, Jorge Ivan Velez wrote: Hi Daren, Here is another aproach a little bit faster taking into account that I'm using your original matrices. My session info is at the end. I'm using a 2.4 GHz Core 2-Duo processor and 3 GB of RAM. # Data set.seed(123) m1 - matrix(rnorm(10), ncol=100) m2 - matrix(rnorm(10), ncol=100) colnames(m1)=paste('m1_',1:100,sep=) colnames(m2)=paste('m2_',1:100,sep=) # Combinations combs=expand.grid(colnames(m1),colnames(m2)) # --- # Option 1 # system.time(apply(combs,1,function(x) cor.test(m1[,x[1]],m2[,x[2]])$p.value)-pvalues1) # user system elapsed # 8.120.018.20 # --- # Option 2 # require(Hmisc) system.time(apply(combs,1,function(x) rcorr(m1[,x[1]],m2[,x[2]])$P[2])-pvalues2) # user system elapsed # 7.000.007.02 HTH, Jorge # - Session Info R version 2.8.0 Patched (2008-11-08 r46864) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan [EMAIL PROTECTED] wrote: My two matrices are roughly the sizes of m1 and m2. I tried using two apply and cor.test to compute the correlation p.values. More than an hour, and the codes are still running. Please help to make it more efficient. m1 - matrix(rnorm(10), ncol=100) m2 - matrix(rnorm(1000), ncol=100) cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) }) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
My two matrices are roughly the sizes of m1 and m2. I tried using two apply and cor.test to compute the correlation p.values. More than an hour, and the codes are still running. Please help to make it more efficient. m1 - matrix(rnorm(10), ncol=100) m2 - matrix(rnorm(1000), ncol=100) cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) }) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.