Also, your request can easily be formulated as an SQL statement,
for example utilizing the 'sqldf' package:

----
library(sqldf)

a1 <- data.frame(id = 1:6,
                 cat = paste('cat', rep(1:3, c(2,3,1))),
                 st = c(1, 7, 30, 40, 59, 91),
                 en = c(5, 25, 39, 55, 70, 120))

a2 <- data.frame(id = paste('probe', 1:8),
                 cat = paste('cat', rep(1:3, c(2,3,3))),
                 st = c(1, 9, 20, 38, 53, 70, 80, 95),
                 en = c(6, 15, 36, 43, 58, 75, 85, 98))

sqldf("select a1.id as id, count(*) from a1, a2 where a1.cat = a2.cat
          and a2.st <= a1.en
          and a2.en >= a1.st
          group by a1.id")

#   id count(*)
#    1        1
#    2        1
#    3        2
#    4        2
#    6        1
----

Of course, it needs some overhead in generating the SQLite tables.
Therefore I would very much like to hear whether there is some
significant improvement -- or the contrary.

//  Hans Werner Borchers



Anh Tran-2 wrote:
> 
> Hi all,I know this topic has came up multiple times, but I've never fully
> understand the apply() function.
> 
> Anyway, I'm here asking for your help again to convert this loop to
> apply().
> 
> I have 2 data frames with the following information: a1 is the fragment
> that
> is need to be covered, a2 is the probes that cover the specific fragment.
> 
> I need to count the number of probes cover every given fragment (they need
> to have the same cat ID to be on the same fragment)
> 
> a1<-data.frame(id=c(1:6), cat=c('cat 1','cat 1','cat 2','cat 2','cat
> 2','cat
> 3'), st=c(1,7,30,40,59,91), en=c(5,25,39,55,70,120));
> a2<-data.frame(id=paste('probe',c(1:8)), cat=c('cat 1','cat 1','cat
> 2','cat
> 2','cat 2','cat 3','cat 3','cat 3'), st=c(1,9,20,38,53,70,80,95),
> en=c(6,15,36,43,58,75,85,98));
> a1$coverage<-NULL;
> 
> I came up with this for loop (basically, if a probe starts before the
> fragment end, and end after a fragment start, it cover that fragment)
> 
> for (i in 1:length(a1$id))
> {
> a1$coverage[i]<-length(a2[a2$st<=a1$en[i]&a2$en>=a1$st[i]&a2$cat==a1$cat[i],]$id);
> }
> 
>> a1$coverage
> [1] 1 1 2 2 0 1
> 
> 
> This loop runs awefully slow when I have 200,000 probes and 30,000
> fragments. Is there anyway I can speed this up with apply()?
> 
> This is the time for my for loop to scan through the first 20 record of my
> dataset:
>    user  system elapsed
>   2.264   0.501   2.770
> 
> I think there is room for improvement here. Any idea?
> 
> Thanks
> -- 
> Regards,
> Anh Tran
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/convert-for-loop-into-apply%28%29-tp18786483p18796799.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to