Re: FW: [R] Optimising code

2003-10-07 Thread Uwe Ligges
Crispin Miller wrote:

I have a function that applies a wilcoxon test to 12 sets of about a quarter of a million pairs


... and let me guess: everything is significiant to an almost arbitrary 
value of \alpha?


:-) For each of quarter of a million sets, I do a wilcoxon between two pairs each 
containing twenty numbers...
I do this 12 times...
Ah. Sorry for misunderstanding.


(and takes about 3 hours). I've replaced the inner loop I 
had originally with a function call via mapply, and also 
considered different approximations  of the wilcoxon, rather 
than that which is implemented in wilcox.test, but that makes 
little difference 9and if anything slows things down :-).

Are you using wilcox.test(), or a self written function?



I started off with wilcox.test, and tried to speed it up by hacking out as much from the original function as possible, since a fair proportion of the code being executed was logic deciding between the different flavours of the test (I didn't expect that to make much difference, but tried it anyway), there was also a call to pnorm() which I replaced with a numeric approximation. This is not inlined into the original function.


In that case you cannot do very much except for writing it in a language 
like C / Fortran, if you really need the speed.

Uwe Ligges

Crispin
 


 
This email is confidential and intended solely for the use o...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: FW: [R] Optimising code

2003-10-07 Thread Crispin Miller
Yup, my mistake for not being very clear! Alas the computer is already very fast - 
looks like it's C for me...

 -Original Message-
 From: Patrick Burns [mailto:[EMAIL PROTECTED]
 Sent: 07 October 2003 12:00
 Cc: Crispin Miller
 Subject: Re: FW: [R] Optimising code
 
 
 I had misunderstood also.  I agree with Uwe either move
 to C or buy a faster computer -- or at least another one.
 
 Pat
 
 Uwe Ligges wrote:
 
  Crispin Miller wrote:
 
  I have a function that applies a wilcoxon test to 12 
 sets of about 
  a quarter of a million pairs
 
 
 
  ... and let me guess: everything is significiant to an almost 
  arbitrary value of \alpha?
 
 
 
  :-) For each of quarter of a million sets, I do a wilcoxon between 
  two pairs each containing twenty numbers...
  I do this 12 times...
 
 
  Ah. Sorry for misunderstanding.
 
 
  (and takes about 3 hours). I've replaced the inner loop I 
 
 
  had originally with a function call via mapply, and also 
 considered 
  different approximations  of the wilcoxon, rather than 
 that which is 
  implemented in wilcox.test, but that makes little 
 difference 9and if 
  anything slows things down :-).
 
  Are you using wilcox.test(), or a self written function?
 
 
 
  I started off with wilcox.test, and tried to speed it up 
 by hacking 
  out as much from the original function as possible, since a fair 
  proportion of the code being executed was logic deciding 
 between the 
  different flavours of the test (I didn't expect that to make much 
  difference, but tried it anyway), there was also a call to pnorm() 
  which I replaced with a numeric approximation. This is not inlined 
  into the original function.
 
 
 
  In that case you cannot do very much except for writing it in a 
  language like C / Fortran, if you really need the speed.
 
  Uwe Ligges
 
  Crispin
   
  
 
   
  This email is confidential and intended solely for the use 
  o...{{dropped}}
 
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 
 
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 
 
 
 
 

 


 
This email is confidential and intended solely for the use o...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: FW: [R] Optimising code

2003-10-07 Thread Adaikalavan RAMASAMY
If you are only interested in calculating the wilcoxon statistic (as one
would with calculating permutated p-values for example), the following
should suffice.

length.na - function(x, ...){
  tmp - !(is.na(x) | is.infinite(x))
  length(x[tmp], ...)
}

rank.na - function(x){
  y - x[!is.na(x)]; 
  x[!is.na(x)] - rank(y);
  return(x)
}

wstats.fn - function(data=NULL, g1=NULL, g2=NULL){
  temp - t( apply(data, 1, rank.na));
  n1 - apply(temp[ ,g1], 1, length.na);
  n2 - apply(temp[ ,g2], 1, length.na);

  wstats - rowSums( temp[,g1], na.rm=TRUE ) - n1*(n2 + 1)/2;
  return(wstats);
}

m - matrix( rnorm( 2500*100 ), ncol=100 )
w1 - wstats.fn(data=m, g1=1:50, g2=51:100)
w2 - apply(m, 1, function(x) wilcox.test( x[1:50], x[51:100] )$stat)

On my machine, it take 2.2s to calculate w1 compared to 15.6s for w2.
rank.na and length.na are not necessary if your data does not have
missing values. 

Since you are doing large enough times it might be wise to code in C. My
C version of this is at least 20-40 times faster than the one in R but
like the one above it does not return p-values.


--
Adaikalavan Ramasamy 


-Original Message-
From: Uwe Ligges [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 07, 2003 6:18 PM
To: Crispin Miller
Cc: R-help (E-mail)
Subject: Re: FW: [R] Optimising code


Crispin Miller wrote:

I have a function that applies a wilcoxon test to 12 sets of about a 
quarter of a million pairs
 
 
... and let me guess: everything is significiant to an almost 
arbitrary
value of \alpha?
 
 
 :-) For each of quarter of a million sets, I do a wilcoxon between two

 pairs each containing twenty numbers... I do this 12 times...

Ah. Sorry for misunderstanding.


(and takes about 3 hours). I've replaced the inner loop I

had originally with a function call via mapply, and also
considered different approximations  of the wilcoxon, rather 
than that which is implemented in wilcox.test, but that makes 
little difference 9and if anything slows things down :-).

Are you using wilcox.test(), or a self written function?

 
 
 I started off with wilcox.test, and tried to speed it up by hacking 
 out as much from the original function as possible, since a fair 
 proportion of the code being executed was logic deciding between the 
 different flavours of the test (I didn't expect that to make much 
 difference, but tried it anyway), there was also a call to pnorm() 
 which I replaced with a numeric approximation. This is not inlined 
 into the original function.


In that case you cannot do very much except for writing it in a language

like C / Fortran, if you really need the speed.

Uwe Ligges

 Crispin
  
 
 
  
 This email is confidential and intended solely for the use 
 o...{{dropped}}
 
 __
 [EMAIL PROTECTED] mailing list 
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help