[R] Optimising code
Hi, Does anyone have any advice on speeding up R functions (short of re-implementing them in C :-) )? I have a function that applies a wilcoxon test to 12 sets of about a quarter of a million pairs (and takes about 3 hours). I've replaced the inner loop I had originally with a function call via mapply, and also considered different approximations of the wilcoxon, rather than that which is implemented in wilcox.test, but that makes little difference 9and if anything slows things down :-). Cheers, Crispin This email is confidential and intended solely for the use o...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Optimising code
Crispin Miller wrote: Hi, Does anyone have any advice on speeding up R functions (short of re-implementing them in C :-) )? Some strategies are in every good book on S/R. I have a function that applies a wilcoxon test to 12 sets of about a quarter of a million pairs ... and let me guess: everything is significiant to an almost arbitrary value of \alpha? (and takes about 3 hours). I've replaced the inner loop I had originally with a function call via mapply, and also considered different approximations of the wilcoxon, rather than that which is implemented in wilcox.test, but that makes little difference 9and if anything slows things down :-). Are you using wilcox.test(), or a self written function? Uwe Ligges Cheers, Crispin __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
FW: [R] Optimising code
I have a function that applies a wilcoxon test to 12 sets of about a quarter of a million pairs ... and let me guess: everything is significiant to an almost arbitrary value of \alpha? :-) For each of quarter of a million sets, I do a wilcoxon between two pairs each containing twenty numbers... I do this 12 times... (and takes about 3 hours). I've replaced the inner loop I had originally with a function call via mapply, and also considered different approximations of the wilcoxon, rather than that which is implemented in wilcox.test, but that makes little difference 9and if anything slows things down :-). Are you using wilcox.test(), or a self written function? I started off with wilcox.test, and tried to speed it up by hacking out as much from the original function as possible, since a fair proportion of the code being executed was logic deciding between the different flavours of the test (I didn't expect that to make much difference, but tried it anyway), there was also a call to pnorm() which I replaced with a numeric approximation. This is not inlined into the original function. Crispin This email is confidential and intended solely for the use o...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: FW: [R] Optimising code
Crispin Miller wrote: I have a function that applies a wilcoxon test to 12 sets of about a quarter of a million pairs ... and let me guess: everything is significiant to an almost arbitrary value of \alpha? :-) For each of quarter of a million sets, I do a wilcoxon between two pairs each containing twenty numbers... I do this 12 times... Ah. Sorry for misunderstanding. (and takes about 3 hours). I've replaced the inner loop I had originally with a function call via mapply, and also considered different approximations of the wilcoxon, rather than that which is implemented in wilcox.test, but that makes little difference 9and if anything slows things down :-). Are you using wilcox.test(), or a self written function? I started off with wilcox.test, and tried to speed it up by hacking out as much from the original function as possible, since a fair proportion of the code being executed was logic deciding between the different flavours of the test (I didn't expect that to make much difference, but tried it anyway), there was also a call to pnorm() which I replaced with a numeric approximation. This is not inlined into the original function. In that case you cannot do very much except for writing it in a language like C / Fortran, if you really need the speed. Uwe Ligges Crispin This email is confidential and intended solely for the use o...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: FW: [R] Optimising code
Yup, my mistake for not being very clear! Alas the computer is already very fast - looks like it's C for me... -Original Message- From: Patrick Burns [mailto:[EMAIL PROTECTED] Sent: 07 October 2003 12:00 Cc: Crispin Miller Subject: Re: FW: [R] Optimising code I had misunderstood also. I agree with Uwe either move to C or buy a faster computer -- or at least another one. Pat Uwe Ligges wrote: Crispin Miller wrote: I have a function that applies a wilcoxon test to 12 sets of about a quarter of a million pairs ... and let me guess: everything is significiant to an almost arbitrary value of \alpha? :-) For each of quarter of a million sets, I do a wilcoxon between two pairs each containing twenty numbers... I do this 12 times... Ah. Sorry for misunderstanding. (and takes about 3 hours). I've replaced the inner loop I had originally with a function call via mapply, and also considered different approximations of the wilcoxon, rather than that which is implemented in wilcox.test, but that makes little difference 9and if anything slows things down :-). Are you using wilcox.test(), or a self written function? I started off with wilcox.test, and tried to speed it up by hacking out as much from the original function as possible, since a fair proportion of the code being executed was logic deciding between the different flavours of the test (I didn't expect that to make much difference, but tried it anyway), there was also a call to pnorm() which I replaced with a numeric approximation. This is not inlined into the original function. In that case you cannot do very much except for writing it in a language like C / Fortran, if you really need the speed. Uwe Ligges Crispin This email is confidential and intended solely for the use o...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help This email is confidential and intended solely for the use o...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: FW: [R] Optimising code
If you are only interested in calculating the wilcoxon statistic (as one would with calculating permutated p-values for example), the following should suffice. length.na - function(x, ...){ tmp - !(is.na(x) | is.infinite(x)) length(x[tmp], ...) } rank.na - function(x){ y - x[!is.na(x)]; x[!is.na(x)] - rank(y); return(x) } wstats.fn - function(data=NULL, g1=NULL, g2=NULL){ temp - t( apply(data, 1, rank.na)); n1 - apply(temp[ ,g1], 1, length.na); n2 - apply(temp[ ,g2], 1, length.na); wstats - rowSums( temp[,g1], na.rm=TRUE ) - n1*(n2 + 1)/2; return(wstats); } m - matrix( rnorm( 2500*100 ), ncol=100 ) w1 - wstats.fn(data=m, g1=1:50, g2=51:100) w2 - apply(m, 1, function(x) wilcox.test( x[1:50], x[51:100] )$stat) On my machine, it take 2.2s to calculate w1 compared to 15.6s for w2. rank.na and length.na are not necessary if your data does not have missing values. Since you are doing large enough times it might be wise to code in C. My C version of this is at least 20-40 times faster than the one in R but like the one above it does not return p-values. -- Adaikalavan Ramasamy -Original Message- From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 07, 2003 6:18 PM To: Crispin Miller Cc: R-help (E-mail) Subject: Re: FW: [R] Optimising code Crispin Miller wrote: I have a function that applies a wilcoxon test to 12 sets of about a quarter of a million pairs ... and let me guess: everything is significiant to an almost arbitrary value of \alpha? :-) For each of quarter of a million sets, I do a wilcoxon between two pairs each containing twenty numbers... I do this 12 times... Ah. Sorry for misunderstanding. (and takes about 3 hours). I've replaced the inner loop I had originally with a function call via mapply, and also considered different approximations of the wilcoxon, rather than that which is implemented in wilcox.test, but that makes little difference 9and if anything slows things down :-). Are you using wilcox.test(), or a self written function? I started off with wilcox.test, and tried to speed it up by hacking out as much from the original function as possible, since a fair proportion of the code being executed was logic deciding between the different flavours of the test (I didn't expect that to make much difference, but tried it anyway), there was also a call to pnorm() which I replaced with a numeric approximation. This is not inlined into the original function. In that case you cannot do very much except for writing it in a language like C / Fortran, if you really need the speed. Uwe Ligges Crispin This email is confidential and intended solely for the use o...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help