Re: [R] speed of a vector operation question

Martin Morgan Fri, 26 Apr 2013 13:34:49 -0700

A very similar question was asked on StackOverflow (by Mikhail? and then I guessthe answers there were somehow not satisfactory...)


http://stackoverflow.com/questions/16213029/more-efficient-strategy-for-which-or-match

where it turns out that a binary search (implemented in R) on the sorted vectoris much faster than sum, etc. I guess because it's log N without copying. Themore complicated condition x > .3 & x < .5 could be satisfied with multiplecalls to the search.


Martin

On 04/26/2013 01:20 PM, William Dunlap wrote:

I think the sum way is the best.


On my Linux machine running R-3.0.0 the sum way is slightly faster:
   > x <- rexp(1e6, 2)
   > system.time(for(i in 1:100)sum(x>.3 & x<.5))
      user  system elapsed
     4.664   0.340   5.018
   > system.time(for(i in 1:100)length(which(x>.3 & x<.5)))
      user  system elapsed
     5.017   0.160   5.186

If you are doing many of these counts on the same dataset you
can save time by using functions like cut(), table(), ecdf(), and
findInterval().  E.g.,

system.time(r1 <- vapply(seq(0,1,by=1/128)[-1], function(i)sum(x>(i-1/128) & 
x<=i), FUN.VALUE=0L))

    user  system elapsed
   5.332   0.568   5.909

system.time(r2 <- table(cut(x, seq(0,1,by=1/128))))

    user  system elapsed
   0.500   0.008   0.511

all.equal(as.vector(r1), as.vector(r2))

[1] TRUE

You should do the timings yourself, as the relative speeds will depend
on the version or dialect of  the R interpreter and how it was compiled.
E.g., with the current development version of 'TIBCO Enterprise Runtime for R' 
(aka 'TERR')
on this same 8-core Linux box the sum way is considerably faster then
the length(which) way:
   > x <- rexp(1e6, 2)
   > system.time(for(i in 1:100)sum(x>.3 & x<.5))
      user  system elapsed
      1.87    0.03    0.48
   > system.time(for(i in 1:100)length(which(x>.3 & x<.5)))
      user  system elapsed
      3.21    0.04    0.83
   > system.time(r1 <- vapply(seq(0,1,by=1/128)[-1], function(i)sum(x>(i-1/128) & 
x<=i), FUN.VALUE=0L))
      user  system elapsed
      2.19    0.04    0.56
   > system.time(r2 <- table(cut(x, seq(0,1,by=1/128))))
      user  system elapsed
      0.27    0.01    0.13
   > all.equal(as.vector(r1), as.vector(r2))
   [1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf
Of lcn
Sent: Friday, April 26, 2013 12:09 PM
To: Mikhail Umorin
Cc: r-help@r-project.org
Subject: Re: [R] speed of a vector operation question

I think the sum way is the best.


On Fri, Apr 26, 2013 at 9:12 AM, Mikhail Umorin <mike...@gmail.com> wrote:

Hello,

I am dealing with numeric vectors 10^5 to 10^6 elements long. The values
are
sorted (with duplicates) in the vector (v). I am obtaining the length of
vectors such as (v < c) or (v > c1 & v < c2), where c, c1, c2 are some
scalar
variables. What is the most efficient way to do this?

I am using sum(v < c) since TRUE's are 1's and FALSE's are 0's. This seems
to
me more efficient than length(which(v < c)), but, please, correct me if I'm
wrong. So, is there anything faster than what I already use?

I'm running R 2.14.2 on Linux kernel 3.4.34.

I appreciate your time,

Mikhail
         [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] speed of a vector operation question

Reply via email to