Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008-11-26 Thread jim holtman
Your time is being taken up in cor.test because you are calling it
100,000 times.  So grin and bear it with the amount of work you are
asking it to do.

Here I am only calling it 100 time:

 m1 - matrix(rnorm(1), ncol=100)
 m2 - matrix(rnorm(1), ncol=100)
 Rprof('/tempxx.txt')
 system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, 
 function(y) { cor.test(x,y)$p.value }) }))
   user  system elapsed
   8.860.008.89


so my guess is that calling it 100,000 times will take:  100,000 *
0.0886 seconds or about 3 hours.

If you run Rprof, you will see if is spending most of its time there:

  0   8.8 root
  1.8.8 apply
  2. .8.8 FUN
  3. . .8.8 apply
  4. . . .8.7 FUN
  5. . . . .8.6 cor.test
  6. . . . . .8.4 cor.test.default
  7. . . . . . .2.4 match.arg
  8. . . . . . . .1.7 eval
  9. . . . . . . . .1.4 deparse
 10. . . . . . . . . .0.6 .deparseOpts
 11. . . . . . . . . . .0.2 pmatch
 11. . . . . . . . . . .0.1 sum
 10. . . . . . . . . .0.5 %in%
 11. . . . . . . . . . .0.3 match
 12. . . . . . . . . . . .0.3 is.factor
 13. . . . . . . . . . . . .0.3 inherits
  8. . . . . . . .0.2 formals
  9. . . . . . . . .0.2 sys.function
  7. . . . . . .2.1 cor
  8. . . . . . . .1.1 match.arg
  9. . . . . . . . .0.7 eval
 10. . . . . . . . . .0.6 deparse
 11. . . . . . . . . . .0.3 .deparseOpts
 12. . . . . . . . . . . .0.1 pmatch
 11. . . . . . . . . . .0.2 %in%
 12. . . . . . . . . . . .0.2 match
 13. . . . . . . . . . . . .0.1 is.factor
 14. . . . . . . . . . . . . .0.1 inherits
  9. . . . . . . . .0.1 formals
  8. . . . . . . .0.5 stopifnot
  9. . . . . . . . .0.2 match.call
  8. . . . . . . .0.1 pmatch
  8. . . . . . . .0.1 is.data.frame
  9. . . . . . . . .0.1 inherits
  7. . . . . . .1.5 paste
  8. . . . . . . .1.4 deparse
  9. . . . . . . . .0.6 .deparseOpts
 10. . . . . . . . . .0.3 pmatch
 10. . . . . . . . . .0.1 any
  9. . . . . . . . .0.6 %in%
 10. . . . . . . . . .0.6 match
 11. . . . . . . . . . .0.5 is.factor
 12. . . . . . . . . . . .0.4 inherits
 13. . . . . . . . . . . . .0.2 mode
  7. . . . . . .0.4 switch
  8. . . . . . . .0.1 qnorm
  7. . . . . . .0.2 pt
  5. . . . .0.1 $

On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan [EMAIL PROTECTED] wrote:

 My two matrices are roughly the sizes of m1 and m2. I tried using two apply 
 and cor.test to compute the correlation p.values. More than an hour, and the 
 codes are still running. Please help to make it more efficient.

 m1 - matrix(rnorm(10), ncol=100)
 m2 - matrix(rnorm(1000), ncol=100)

 cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { 
 cor.test(x,y)$p.value }) })

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008-11-26 Thread David Winsemius
He might try rcorr from Hmisc instead. Using your test suite, it gives  
about a 20% improvement on my MacPro:


 m1 - matrix(rnorm(1), ncol=100)
 m2 - matrix(rnorm(1), ncol=100)
 Rprof('/tempxx.txt')
 system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1,  
function(y) { rcorr(x,y)$P }) }))

   user  system elapsed
  4.221   0.049   4.289

 m1 - matrix(rnorm(1), ncol=100)
 m2 - matrix(rnorm(1), ncol=100)
 Rprof('/tempxx.txt')
 system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1,  
function(y) { cor.test(x,y)$p.value }) }))

   user  system elapsed
  5.328   0.038   5.355

I'm not a smart enough programmer to figure out whether there might be  
an even more efficient method that takes advantage rcorr's  implicit  
looping through a set of columns to produce an all combinations  
return.


--
David Winsemius, MD
Heritage Labs


On Nov 26, 2008, at 9:14 AM, jim holtman wrote:


Your time is being taken up in cor.test because you are calling it
100,000 times.  So grin and bear it with the amount of work you are
asking it to do.

Here I am only calling it 100 time:


m1 - matrix(rnorm(1), ncol=100)
m2 - matrix(rnorm(1), ncol=100)
Rprof('/tempxx.txt')
system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1,  
function(y) { cor.test(x,y)$p.value }) }))

  user  system elapsed
  8.860.008.89




so my guess is that calling it 100,000 times will take:  100,000 *
0.0886 seconds or about 3 hours.

If you run Rprof, you will see if is spending most of its time there:

 0   8.8 root
 1.8.8 apply
 2. .8.8 FUN
 3. . .8.8 apply
 4. . . .8.7 FUN
 5. . . . .8.6 cor.test
 6. . . . . .8.4 cor.test.default
 7. . . . . . .2.4 match.arg
 8. . . . . . . .1.7 eval
 9. . . . . . . . .1.4 deparse
10. . . . . . . . . .0.6 .deparseOpts
11. . . . . . . . . . .0.2 pmatch
11. . . . . . . . . . .0.1 sum
10. . . . . . . . . .0.5 %in%
11. . . . . . . . . . .0.3 match
12. . . . . . . . . . . .0.3 is.factor
13. . . . . . . . . . . . .0.3 inherits
 8. . . . . . . .0.2 formals
 9. . . . . . . . .0.2 sys.function
 7. . . . . . .2.1 cor
 8. . . . . . . .1.1 match.arg
 9. . . . . . . . .0.7 eval
10. . . . . . . . . .0.6 deparse
11. . . . . . . . . . .0.3 .deparseOpts
12. . . . . . . . . . . .0.1 pmatch
11. . . . . . . . . . .0.2 %in%
12. . . . . . . . . . . .0.2 match
13. . . . . . . . . . . . .0.1 is.factor
14. . . . . . . . . . . . . .0.1 inherits
 9. . . . . . . . .0.1 formals
 8. . . . . . . .0.5 stopifnot
 9. . . . . . . . .0.2 match.call
 8. . . . . . . .0.1 pmatch
 8. . . . . . . .0.1 is.data.frame
 9. . . . . . . . .0.1 inherits
 7. . . . . . .1.5 paste
 8. . . . . . . .1.4 deparse
 9. . . . . . . . .0.6 .deparseOpts
10. . . . . . . . . .0.3 pmatch
10. . . . . . . . . .0.1 any
 9. . . . . . . . .0.6 %in%
10. . . . . . . . . .0.6 match
11. . . . . . . . . . .0.5 is.factor
12. . . . . . . . . . . .0.4 inherits
13. . . . . . . . . . . . .0.2 mode
 7. . . . . . .0.4 switch
 8. . . . . . . .0.1 qnorm
 7. . . . . . .0.2 pt
 5. . . . .0.1 $

On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan [EMAIL PROTECTED]  
wrote:


My two matrices are roughly the sizes of m1 and m2. I tried using  
two apply and cor.test to compute the correlation p.values. More  
than an hour, and the codes are still running. Please help to make  
it more efficient.


m1 - matrix(rnorm(10), ncol=100)
m2 - matrix(rnorm(1000), ncol=100)

cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y)  
{ cor.test(x,y)$p.value }) })


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008-11-26 Thread Jorge Ivan Velez
Hi Daren,
Here is another aproach a little bit faster taking into account that I'm
using your original matrices.  My session info is at the end. I'm using a
2.4 GHz Core 2-Duo processor and 3 GB of RAM.

 # Data
 set.seed(123)
 m1 - matrix(rnorm(10), ncol=100)
 m2 - matrix(rnorm(10), ncol=100)
 colnames(m1)=paste('m1_',1:100,sep=)
 colnames(m2)=paste('m2_',1:100,sep=)

# Combinations
 combs=expand.grid(colnames(m1),colnames(m2))

# ---
# Option 1
#
system.time(apply(combs,1,function(x)
cor.test(m1[,x[1]],m2[,x[2]])$p.value)-pvalues1)
#  user  system elapsed
#   8.120.018.20

# ---
# Option 2
#
require(Hmisc)
system.time(apply(combs,1,function(x)
rcorr(m1[,x[1]],m2[,x[2]])$P[2])-pvalues2)
#   user  system elapsed
#   7.000.007.02


HTH,

Jorge


# -  Session Info 
R version 2.8.0 Patched (2008-11-08 r46864)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base



On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan [EMAIL PROTECTED] wrote:


 My two matrices are roughly the sizes of m1 and m2. I tried using two apply
 and cor.test to compute the correlation p.values. More than an hour, and the
 codes are still running. Please help to make it more efficient.

 m1 - matrix(rnorm(10), ncol=100)
 m2 - matrix(rnorm(1000), ncol=100)

 cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) {
 cor.test(x,y)$p.value }) })

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008-11-26 Thread hadley wickham
On Wed, Nov 26, 2008 at 8:14 AM, jim holtman [EMAIL PROTECTED] wrote:
 Your time is being taken up in cor.test because you are calling it
 100,000 times.  So grin and bear it with the amount of work you are
 asking it to do.

 Here I am only calling it 100 time:

 m1 - matrix(rnorm(1), ncol=100)
 m2 - matrix(rnorm(1), ncol=100)
 Rprof('/tempxx.txt')
 system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, 
 function(y) { cor.test(x,y)$p.value }) }))
   user  system elapsed
   8.860.008.89


 so my guess is that calling it 100,000 times will take:  100,000 *
 0.0886 seconds or about 3 hours.

You can make it ~3 times faster by vectorising the testing:

m1 - matrix(rnorm(1), ncol=100)
m2 - matrix(rnorm(1), ncol=100)

system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1,
function(y) { cor.test(x,y)$p.value })}))


system.time({
r - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })})

df - nrow(m1) - 2
t - sqrt(df) * r / sqrt(1 - r ^ 2)
p - pt(t, df)
p - 2 * pmin(p, 1 - p)
})


all.equal(cor.pvalues, p)


You can make cor much faster by stripping away all the error checking
code and calling the internal c function  directly (suggested by the
Rprof output):


system.time({
r - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })})
})

system.time({
r2 - apply(m1, 1, function(x) { apply(m2, 1, function(y) {
.Internal(cor(x, y, 4L, FALSE)) })})
})

1.5s vs 0.2 s on my computer.  Combining both changes gives me a ~25
time speed up - I suspect you can do even better if you think about
what calculations are being duplicated in the computation of the
correlations.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008-11-26 Thread Daren Tan

Out of desperation, I made the following function which hadley beats me to it 
:P. Thanks everyone for the great help. 
 

cor.p.values - function(r, n) {
  df - n - 2
  STATISTIC - c(sqrt(df) * r / sqrt(1 - r^2))
  p - pt(STATISTIC, df)
  return(2 * pmin(p, 1 - p))
}

 Date: Wed, 26 Nov 2008 09:33:59 -0600
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Re: [R] Very slow: using double apply and cor.test to compute 
 correlation p.values for 2 matrices
 CC: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 
 On Wed, Nov 26, 2008 at 8:14 AM, jim holtman  wrote:
 Your time is being taken up in cor.test because you are calling it
 100,000 times. So grin and bear it with the amount of work you are
 asking it to do.

 Here I am only calling it 100 time:

 m1 - matrix(rnorm(1), ncol=100)
 m2 - matrix(rnorm(1), ncol=100)
 Rprof('/tempxx.txt')
 system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, 
 function(y) { cor.test(x,y)$p.value }) }))
 user system elapsed
 8.86 0.00 8.89


 so my guess is that calling it 100,000 times will take: 100,000 *
 0.0886 seconds or about 3 hours.
 
 You can make it ~3 times faster by vectorising the testing:
 
 m1 - matrix(rnorm(1), ncol=100)
 m2 - matrix(rnorm(1), ncol=100)
 
 system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1,
 function(y) { cor.test(x,y)$p.value })}))
 
 
 system.time({
 r - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })})
 
 df - nrow(m1) - 2
 t - sqrt(df) * r / sqrt(1 - r ^ 2)
 p - pt(t, df)
 p - 2 * pmin(p, 1 - p)
 })
 
 
 all.equal(cor.pvalues, p)
 
 
 You can make cor much faster by stripping away all the error checking
 code and calling the internal c function directly (suggested by the
 Rprof output):
 
 
 system.time({
 r - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })})
 })
 
 system.time({
 r2 - apply(m1, 1, function(x) { apply(m2, 1, function(y) {
 .Internal(cor(x, y, 4L, FALSE)) })})
 })
 
 1.5s vs 0.2 s on my computer. Combining both changes gives me a ~25
 time speed up - I suspect you can do even better if you think about
 what calculations are being duplicated in the computation of the
 correlations.
 
 Hadley
 
 -- 
 http://had.co.nz/
_
[[elided Hotmail spam]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008-11-26 Thread Thomas Lumley


You can do much better by doing the correlations as a matrix operation:

system.time({

+   m1-scale(m1)
+   m2-scale(m2)
+   r-crossprod(m1,m2)/100
+   df-100
+   tstat-sqrt(df)*r/sqrt(1-r^2)
+   p-pt(tstat,df)
+   })
   user  system elapsed
  0.025   0.004   0.028

There might be a factor of n/(n-1) missing somewhere, which would be 
fixable if you could bring yourself to care about it.


-thomas



On Wed, 26 Nov 2008, Jorge Ivan Velez wrote:


Hi Daren,
Here is another aproach a little bit faster taking into account that I'm
using your original matrices.  My session info is at the end. I'm using a
2.4 GHz Core 2-Duo processor and 3 GB of RAM.

# Data
set.seed(123)
m1 - matrix(rnorm(10), ncol=100)
m2 - matrix(rnorm(10), ncol=100)
colnames(m1)=paste('m1_',1:100,sep=)
colnames(m2)=paste('m2_',1:100,sep=)

# Combinations
combs=expand.grid(colnames(m1),colnames(m2))

# ---
# Option 1
#
system.time(apply(combs,1,function(x)
cor.test(m1[,x[1]],m2[,x[2]])$p.value)-pvalues1)
#  user  system elapsed
#   8.120.018.20

# ---
# Option 2
#
require(Hmisc)
system.time(apply(combs,1,function(x)
rcorr(m1[,x[1]],m2[,x[2]])$P[2])-pvalues2)
#   user  system elapsed
#   7.000.007.02


HTH,

Jorge


# -  Session Info 
R version 2.8.0 Patched (2008-11-08 r46864)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base



On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan [EMAIL PROTECTED] wrote:



My two matrices are roughly the sizes of m1 and m2. I tried using two apply
and cor.test to compute the correlation p.values. More than an hour, and the
codes are still running. Please help to make it more efficient.

m1 - matrix(rnorm(10), ncol=100)
m2 - matrix(rnorm(1000), ncol=100)

cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) {
cor.test(x,y)$p.value }) })

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008-11-25 Thread Daren Tan

My two matrices are roughly the sizes of m1 and m2. I tried using two apply and 
cor.test to compute the correlation p.values. More than an hour, and the codes 
are still running. Please help to make it more efficient. 
 
m1 - matrix(rnorm(10), ncol=100)
m2 - matrix(rnorm(1000), ncol=100)

cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { 
cor.test(x,y)$p.value }) })

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.