[R] post

Alexey Ush Mon, 13 Sep 2010 16:02:09 -0700

Hello,

I have a question regarding how to speed up the t.test on large dataset. For 
example, I have a table "tab" which looks like:


        a       b       c       d       e       f       g       h....
1       
2
3
4
5

...

100000

dim(tab) is 100000 x 100



I need to do the t.test for each row on the two subsets of columns, ie to 
compare a b d group against e f g group at each row.


subset 1:                                       
        a       b       d
1       
2
3
4
5

...

100000


subset 2:
        e       f       g
1       
2
3
4
5

...

100000

    100000 t.test's for each row for these two subsets will take around 1 min. 
The prblem is that I have around 10000 different combinations of such a 
subsets. therefore 1min*10000
=10000min in the case if I will use "for" loop like this:

n1=10000 #number of subset combinations
for (i1 in 1:n1) {

n2=100000 # number of rows
i2=1
for (i2 in 1:n1) {
        t.test(tab[i2,v5],tab[i2,v6])$p.value  #v5 and v6 are vectors 
containing the veriable names for the two subsets (they are different for each 
loop)
        }

}


My question is there more efficient way how to do this computations in a short 
period of time? Any packages, like plyr? May be direct calculations isted of 
using t.test function?


Thank you. 



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] post

Reply via email to