Thank you all very much for your time and suggestions. The link to 
stackoverflow was very helpful. Here are some timings in case someone wants to 
know. (I noticed that microbenchmark results vary, depending on how many 
functions one tries to benchmark at a time. However, the "min" stays about the 
same)

# just to refresh, most of the code is from stackoverflow link provided by 
Martin Morgan  : http://stackoverflow.com/questions/16213029/more-efficient-
strategy-for-which-or-match

f0 <- function(v) length(which(v < 0))

f1 <- function(v) sum(v < 0)

f2 <- function(v) which.min(v < 0) - 1L

f3 <- function(x) { # binary search implemented in R
    imin <- 1L
    imax <- length(x)
    while (imax >= imin) {
        imid <- as.integer(imin + (imax - imin) / 2)
        if (x[imid] >= 0)
            imax <- imid - 1L
        else
            imin <- imid + 1L
    }
    imax
}

f3.c <- cmpfun(f3) # pre-compiled

# binary search in C
f4 <- cfunction(c(x = "numeric"), " 
    int imin = 0, imax = Rf_length(x) - 1, imid;
    while (imax >= imin) {
        imid = imin + (imax - imin) / 2;
        if (REAL(x)[imid] >= 0)
            imax = imid - 1;
        else
            imin = imid + 1;
    }
    return ScalarInteger(imax + 1);
")

# this one is separate suggestion by William Dunlap :
f5 <- function(v) {
  tabulate(findInterval(v, c(-Inf, 0, 1, Inf)))[1]
}

>vec <- c(seq(-100,-1,length.out=1e6), rep(0,20), seq(1,100,length.out=1e6))
# the identity of results was verified

>microbenchmark(f1(vec), f2(vec), f3(vec), f3.c(vec), f4(vec), f5(vec))
Unit: microseconds
      expr       min         lq    median         uq       max neval
   f1(vec) 17054.233 17831.1385 18514.305 19512.4705 54603.435   100
   f2(vec) 23624.353 25026.4265 26034.785 29322.1150 60014.458   100
   f3(vec)    76.902    93.2340   111.834   116.8370   129.888   100
 f3.c(vec)    21.883    30.7530    37.757    54.1250    62.939   100
   f4(vec)     6.575    10.5885    30.389    31.9385    37.610   100
   f5(vec) 35365.088 36767.6175 38317.103 40671.2000 69209.425   100


So, i'll try to go with the inline binary search and see if I can precompile 
complex conditions.

Thank you, again, for your help!

Mikhail.




On Friday, April 26, 2013 20:52:27 Suzen, Mehmet wrote:
> Hello Mikhail,
> 
> I could suggest you to use ff package for fast access to large data
> structures:
> 
> http://cran.r-project.org/web/packages/ff/index.html
> http://wsopuppenkiste.wiso.uni-goettingen.de/ff/ff_1.0/inst/doc/ff.pdf
> 
> Best
> 
> Mehmet
> 
> On 26 April 2013 18:12, Mikhail Umorin <mike...@gmail.com> wrote:
> > Hello,
> > 
> > I am dealing with numeric vectors 10^5 to 10^6 elements long. The values
> > are sorted (with duplicates) in the vector (v). I am obtaining the length
> > of vectors such as (v < c) or (v > c1 & v < c2), where c, c1, c2 are some
> > scalar variables. What is the most efficient way to do this?
> > 
> > I am using sum(v < c) since TRUE's are 1's and FALSE's are 0's. This seems
> > to me more efficient than length(which(v < c)), but, please, correct me
> > if I'm wrong. So, is there anything faster than what I already use?
> > 
> > I'm running R 2.14.2 on Linux kernel 3.4.34.
> > 
> > I appreciate your time,
> > 
> > Mikhail
> > 
> >         [[alternative HTML version deleted]]
> > 
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to