Thanks! That worked like a charm. Math
Gabor Grothendieck wrote > > On Fri, Jul 13, 2012 at 1:41 PM, mdvaan <mathijsdevaan@> wrote: >> Here's some data (which should give you the error messages): >> >> # read in data >> data <- read.csv("https://dl.dropbox.com/u/13631687/data.csv", header >> = >> T, sep = ",") >> >> # first paste all data >> data1 <- paste(data[,1], collapse = "|") >> >> # second paste subsets of the data >> data2a <- paste(data[1:750,1], collapse = "|") >> data2b <- paste(data[751:1500,1], collapse = "|") >> >> # define the object to be searched >> text <- c("the first is Santa Fe Gold Corp", "the second is >> Starpharma >> Holdings") >> >> # match >> strapplyc(text, data1) >> strapplyc(text, data2a) >> strapplyc(text, data2b) >> >> Thanks in advance! >> > > Although it seems that strapplyc can handle larger regular expressions > than grep in R it seems neither can handle as many as in your example > so process it in chunks: > > k <- 3000 # chunk size > > f <- function(from, text) { > to <- min(from + k - 1, nrow(data)) > r <- paste(data[seq(from, to), 1], collapse = "|") > r <- gsub("[().*?+{}]", "", r) > strapply(text, r) > } > ix <- seq(1, nrow(data), k) > out <- lapply(text, function(text) unlist(lapply(ix, f, text))) > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- View this message in context: http://r.789695.n4.nabble.com/Maximum-number-of-patterns-and-speed-in-grep-tp4635613p4636657.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.