R's findInterval can also take advantage of a sorted x vector. E.g., in R-3.0.0 on the same 8-core Linux box:
> x <- rexp(1e6, 2) > system.time(for(i in 1:100)tabulate(findInterval(x, c(-Inf, .3, .5, Inf)))[2]) user system elapsed 2.444 0.000 2.446 > xs <- sort(x) > system.time(for(i in 1:100)tabulate(findInterval(xs, c(-Inf, .3, .5, > Inf)))[2]) user system elapsed 1.472 0.000 1.475 > > tabulate(findInterval(xs, c(-Inf, .3, .5, Inf)))[2] [1] 180636 > sum( xs > .3 & xs <= .5 ) [1] 180636 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: Martin Morgan [mailto:mtmor...@fhcrc.org] > Sent: Friday, April 26, 2013 1:33 PM > To: William Dunlap > Cc: lcn; Mikhail Umorin; r-help@r-project.org > Subject: Re: [R] speed of a vector operation question > > A very similar question was asked on StackOverflow (by Mikhail? and then I > guess > the answers there were somehow not satisfactory...) > > > http://stackoverflow.com/questions/16213029/more-efficient-strategy-for-which-or- > match > > where it turns out that a binary search (implemented in R) on the sorted > vector > is much faster than sum, etc. I guess because it's log N without copying. The > more complicated condition x > .3 & x < .5 could be satisfied with multiple > calls to the search. > > Martin > > On 04/26/2013 01:20 PM, William Dunlap wrote: > > > >> I think the sum way is the best. > > > > On my Linux machine running R-3.0.0 the sum way is slightly faster: > > > x <- rexp(1e6, 2) > > > system.time(for(i in 1:100)sum(x>.3 & x<.5)) > > user system elapsed > > 4.664 0.340 5.018 > > > system.time(for(i in 1:100)length(which(x>.3 & x<.5))) > > user system elapsed > > 5.017 0.160 5.186 > > > > If you are doing many of these counts on the same dataset you > > can save time by using functions like cut(), table(), ecdf(), and > > findInterval(). E.g., > >> system.time(r1 <- vapply(seq(0,1,by=1/128)[-1], function(i)sum(x>(i-1/128) > >> & x<=i), > FUN.VALUE=0L)) > > user system elapsed > > 5.332 0.568 5.909 > >> system.time(r2 <- table(cut(x, seq(0,1,by=1/128)))) > > user system elapsed > > 0.500 0.008 0.511 > >> all.equal(as.vector(r1), as.vector(r2)) > > [1] TRUE > > > > You should do the timings yourself, as the relative speeds will depend > > on the version or dialect of the R interpreter and how it was compiled. > > E.g., with the current development version of 'TIBCO Enterprise Runtime for > > R' (aka > 'TERR') > > on this same 8-core Linux box the sum way is considerably faster then > > the length(which) way: > > > x <- rexp(1e6, 2) > > > system.time(for(i in 1:100)sum(x>.3 & x<.5)) > > user system elapsed > > 1.87 0.03 0.48 > > > system.time(for(i in 1:100)length(which(x>.3 & x<.5))) > > user system elapsed > > 3.21 0.04 0.83 > > > system.time(r1 <- vapply(seq(0,1,by=1/128)[-1], > > function(i)sum(x>(i-1/128) & x<=i), > FUN.VALUE=0L)) > > user system elapsed > > 2.19 0.04 0.56 > > > system.time(r2 <- table(cut(x, seq(0,1,by=1/128)))) > > user system elapsed > > 0.27 0.01 0.13 > > > all.equal(as.vector(r1), as.vector(r2)) > > [1] TRUE > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > > > > >> -----Original Message----- > >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > >> Of lcn > >> Sent: Friday, April 26, 2013 12:09 PM > >> To: Mikhail Umorin > >> Cc: r-help@r-project.org > >> Subject: Re: [R] speed of a vector operation question > >> > >> I think the sum way is the best. > >> > >> > >> On Fri, Apr 26, 2013 at 9:12 AM, Mikhail Umorin <mike...@gmail.com> wrote: > >> > >>> Hello, > >>> > >>> I am dealing with numeric vectors 10^5 to 10^6 elements long. The values > >>> are > >>> sorted (with duplicates) in the vector (v). I am obtaining the length of > >>> vectors such as (v < c) or (v > c1 & v < c2), where c, c1, c2 are some > >>> scalar > >>> variables. What is the most efficient way to do this? > >>> > >>> I am using sum(v < c) since TRUE's are 1's and FALSE's are 0's. This seems > >>> to > >>> me more efficient than length(which(v < c)), but, please, correct me if > >>> I'm > >>> wrong. So, is there anything faster than what I already use? > >>> > >>> I'm running R 2.14.2 on Linux kernel 3.4.34. > >>> > >>> I appreciate your time, > >>> > >>> Mikhail > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.