On Apr 21, 2010, at 5:19 PM, William Dunlap wrote:
-----Original Message-----
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Jeff Brown
Sent: Wednesday, April 21, 2010 8:08 AM
To: r-help@r-project.org
Subject: Re: [R] Count matches of a sequence in a vector?
This sort of calculation can't be vectorized; you'll have to
iterate through
the sequence, e.g. with a "for" loop. I don't know if a
routine has already
been written.
It can be partially vectorized:
f2 <- function (v, p) {
retval <- TRUE
i <- seq_len(length(v) - length(p) + 1L) - 1L
for (j in seq_along(p)) {
retval <- retval & v[i + j] == p[j]
}
retval
}
I understood the task to be to count the number of matches so this
modification would do that:
> f2 <- function (v, p) {
+ retval <- 0
+ i <- seq_len(length(v) - length(p) + 1L) - 1L
+ for (j in seq_along(p)) {
+ retval <- v[i + j] == p[j] + retval
+ }
+ sum(retval)
+ }
> f2(v, vseq)
[1] 1
And that code also out paces the earlier one I offered , isn't
constrained to a length three pattern, and may be more memory
efficient, although the benchmark function does not provide feedback
on that aspect:
> benchmark(
+ logsum(v, vseq),
+ summatches(v,vseq),
+ sumroll(v,vseq), f2(v, vseq),
+ order=c('replications', 'elapsed'), replications=1000)
test replications elapsed relative user.self
sys.self user.child sys.child
4 f2(v, vseq) 1000 0.020 1.00 0.020
0.001 0 0
1 logsum(v, vseq) 1000 0.024 1.20 0.024
0.000 0 0
2 summatches(v, vseq) 1000 0.164 8.20 0.164
0.001 0 0
3 sumroll(v, vseq) 1000 1.023 51.15 1.024
0.005 0 0
E.g., for the following data
set.seed(1)
v <- sample(1:10, size=1e6, replace=TRUE)
p <- 2:4
compare using zoo::rollapply (which loops over the long v)
f1 <- function(v, p)rollapply(zoo(v), length(p), function(x)all(x==p))
and f2 (which loops over the short p). I get
library(zoo)
system.time(r1 <- f1(v,p))
user system elapsed
13.17 0.06 13.25
system.time(r2 <- f2(v,p))
user system elapsed
0.12 0.00 0.12
identical(which(r1), which(r2))
[1] TRUE
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
--
View this message in context:
http://n4.nabble.com/Count-matches-of-a-sequence-in-a-vector-t
p2019018p2019108.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.