Hi Martin and everybody,

sorry for the long delay. Thanks for all the suggestions. With my code and my training data I found similar numbers to the one below.

Thanks

Cheers

Fabien

I did this to generate and search 40 million unique strings

> grams <- as.character(1:4e7)        ## a long time passes...
> system.time(grep("^900001", grams)) ## similar times to grepl
   user  system elapsed
 10.384   0.168  10.543

Is that the basic task you're trying to accomplish? grep(l) goes quickly to C, so I don't think data.table or other will be markedly faster if you're looking for an arbitrary regular expression (use fixed=TRUE if looking for an exact match).

If you're looking for strings that start with a pattern, then in R-3.3.0 there is

> system.time(res0 <- startsWith(grams, "900001"))
   user  system elapsed
  0.658   0.012   0.669

which returns the same result as grepl

> identical(res0, res1 <- grepl("^900001", grams))
[1] TRUE

One can also parallelize the already vectorized grepl function with parallel::pvec, with some opportunity for gain (compared to grepl) on non-Windows

> system.time(res2 <- pvec(seq_along(grams), function(i) grepl("^900001", grams[i]), mc.cores=8))
   user  system elapsed
 24.996   1.709   3.974
> identical(res0, res2)
[[1]] TRUE

I think anything else would require pre-processing of some kind, and then some more detail about what your data looks like is required.

--
Dr Fabien Tarrade

Quantitative Analyst/Developer - Data Scientist

Senior data analyst specialised in the modelling, processing and statistical treatment of data. PhD in Physics, 10 years of experience as researcher at the forefront of international scientific research.
Fascinated by finance and data modelling.

Geneva, Switzerland

Email : cont...@fabien-tarrade.eu <mailto:cont...@fabien-tarrade.eu>
Phone : www.fabien-tarrade.eu <http://www.fabien-tarrade.eu>
Phone : +33 (0)6 14 78 70 90

LinkedIn <http://ch.linkedin.com/in/fabientarrade/> Twitter <https://twitter.com/fabtar> Google <https://plus.google.com/+FabienTarradeProfile/posts> Facebook <https://www.facebook.com/fabien.tarrade.eu> Google <skype:fabtarhiggs?call> Xing <https://www.xing.com/profile/Fabien_Tarrade>

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to