Hi Fabien, I was going to send this last night, but I thought it was too simple. Runs in about one millisecond.
df<-data.frame(freq=runif(1000), strings=apply(matrix(sample(LETTERS,10000,TRUE),ncol=10), 1,paste,collapse="")) match.ind<-grep("DF",df$strings) match.ind [1] 2 11 91 133 169 444 547 605 734 943 Jim On Mon, Apr 11, 2016 at 5:27 AM, Fabien Tarrade <fabien.tarr...@gmail.com> wrote: > Hi Duncan, >> >> Didn't you post the same question yesterday? Perhaps nobody answered >> because your question is unanswerable. > > sorry, I got a email that my message was waiting for approval and when I > look at the forum I didn't see my message and this is why I sent it again > and this time I did check that the format of my message was text only. Sorry > for the noise. >> >> You need to describe what the strings are like and what the patterns are >> like if you want advice on speeding things up. > > my strings are 1-gram up to 5-grams (sequence of 1 work up to 5 words) and I > am searching for the frequency in my DF of the strings starting with a > sequence of few words. > > I guess these days it is standard to use DF with millions of entries so I > was wondering how people are doing that in the faster way. > > Thanks > Cheers > Fabien > > -- > Dr Fabien Tarrade > > Quantitative Analyst/Developer - Data Scientist > > Senior data analyst specialised in the modelling, processing and statistical > treatment of data. > PhD in Physics, 10 years of experience as researcher at the forefront of > international scientific research. > Fascinated by finance and data modelling. > > Geneva, Switzerland > > Email : <mailto:cont...@fabien-tarrade.eu>cont...@fabien-tarrade.eu > Phone : <http://www.fabien-tarrade.eu>www.fabien-tarrade.eu > Phone : +33 (0)6 14 78 70 90 > > LinkedIn <http://ch.linkedin.com/in/fabientarrade/> Twitter > <https://twitter.com/fabtar> Google > <https://plus.google.com/+FabienTarradeProfile/posts> Facebook > <https://www.facebook.com/fabien.tarrade.eu> Google <skype:fabtarhiggs?call> > Xing <https://www.xing.com/profile/Fabien_Tarrade> > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.