Also, your request can easily be formulated as an SQL statement, for example utilizing the 'sqldf' package:
---- library(sqldf) a1 <- data.frame(id = 1:6, cat = paste('cat', rep(1:3, c(2,3,1))), st = c(1, 7, 30, 40, 59, 91), en = c(5, 25, 39, 55, 70, 120)) a2 <- data.frame(id = paste('probe', 1:8), cat = paste('cat', rep(1:3, c(2,3,3))), st = c(1, 9, 20, 38, 53, 70, 80, 95), en = c(6, 15, 36, 43, 58, 75, 85, 98)) sqldf("select a1.id as id, count(*) from a1, a2 where a1.cat = a2.cat and a2.st <= a1.en and a2.en >= a1.st group by a1.id") # id count(*) # 1 1 # 2 1 # 3 2 # 4 2 # 6 1 ---- Of course, it needs some overhead in generating the SQLite tables. Therefore I would very much like to hear whether there is some significant improvement -- or the contrary. // Hans Werner Borchers Anh Tran-2 wrote: > > Hi all,I know this topic has came up multiple times, but I've never fully > understand the apply() function. > > Anyway, I'm here asking for your help again to convert this loop to > apply(). > > I have 2 data frames with the following information: a1 is the fragment > that > is need to be covered, a2 is the probes that cover the specific fragment. > > I need to count the number of probes cover every given fragment (they need > to have the same cat ID to be on the same fragment) > > a1<-data.frame(id=c(1:6), cat=c('cat 1','cat 1','cat 2','cat 2','cat > 2','cat > 3'), st=c(1,7,30,40,59,91), en=c(5,25,39,55,70,120)); > a2<-data.frame(id=paste('probe',c(1:8)), cat=c('cat 1','cat 1','cat > 2','cat > 2','cat 2','cat 3','cat 3','cat 3'), st=c(1,9,20,38,53,70,80,95), > en=c(6,15,36,43,58,75,85,98)); > a1$coverage<-NULL; > > I came up with this for loop (basically, if a probe starts before the > fragment end, and end after a fragment start, it cover that fragment) > > for (i in 1:length(a1$id)) > { > a1$coverage[i]<-length(a2[a2$st<=a1$en[i]&a2$en>=a1$st[i]&a2$cat==a1$cat[i],]$id); > } > >> a1$coverage > [1] 1 1 2 2 0 1 > > > This loop runs awefully slow when I have 200,000 probes and 30,000 > fragments. Is there anyway I can speed this up with apply()? > > This is the time for my for loop to scan through the first 20 record of my > dataset: > user system elapsed > 2.264 0.501 2.770 > > I think there is room for improvement here. Any idea? > > Thanks > -- > Regards, > Anh Tran > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/convert-for-loop-into-apply%28%29-tp18786483p18796799.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.