On Apr 21, 2010, at 5:19 PM, William Dunlap wrote:

-----Original Message-----
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Jeff Brown
Sent: Wednesday, April 21, 2010 8:08 AM
To: r-help@r-project.org
Subject: Re: [R] Count matches of a sequence in a vector?


This sort of calculation can't be vectorized; you'll have to
iterate through
the sequence, e.g. with a "for" loop.  I don't know if a
routine has already
been written.

It can be partially vectorized:
f2 <- function (v, p) {
   retval <- TRUE
   i <- seq_len(length(v) - length(p) + 1L) - 1L
   for (j in seq_along(p)) {
       retval <- retval & v[i + j] == p[j]
   }
   retval
}

I understood the task to be to count the number of matches so this modification would do that:

> f2 <- function (v, p) {
+    retval <- 0
+    i <- seq_len(length(v) - length(p) + 1L) - 1L
+    for (j in seq_along(p)) {
+        retval <- v[i + j] == p[j] + retval
+    }
+    sum(retval)
+ }
> f2(v, vseq)
[1] 1

And that code also out paces the earlier one I offered , isn't constrained to a length three pattern, and may be more memory efficient, although the benchmark function does not provide feedback on that aspect:

> benchmark(
+    logsum(v, vseq),
+    summatches(v,vseq),
+    sumroll(v,vseq), f2(v, vseq),
+    order=c('replications', 'elapsed'), replications=1000)
test replications elapsed relative user.self sys.self user.child sys.child 4 f2(v, vseq) 1000 0.020 1.00 0.020 0.001 0 0 1 logsum(v, vseq) 1000 0.024 1.20 0.024 0.000 0 0 2 summatches(v, vseq) 1000 0.164 8.20 0.164 0.001 0 0 3 sumroll(v, vseq) 1000 1.023 51.15 1.024 0.005 0 0

E.g., for the following data
set.seed(1)
v <- sample(1:10, size=1e6, replace=TRUE)
p <- 2:4
compare using zoo::rollapply (which loops over the long v)
f1 <- function(v, p)rollapply(zoo(v), length(p), function(x)all(x==p))
and f2 (which loops over the short p).  I get

library(zoo)
system.time(r1 <- f1(v,p))
   user  system elapsed
  13.17    0.06   13.25
system.time(r2 <- f2(v,p))
   user  system elapsed
   0.12    0.00    0.12
identical(which(r1), which(r2))
[1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

--
View this message in context:
http://n4.nabble.com/Count-matches-of-a-sequence-in-a-vector-t
p2019018p2019108.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to