It is not reproducible [1] because I cannot run your (representative) example. The type of regex pattern, token, and even the character of the data you are searching can affect possible optimizations. Note that a non-memory-resident tool such as sed or perl may be an appropriate tool for a problem like this.
[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. Simon Pickert <simon.pick...@t-online.de> wrote: >How’s that not reproducible? > >1. Data frame, one column with text strings >2. Size of data frame= 4million observations >3. A bunch of gsubs in a row ( gsub(patternvector, >“[token]“,dataframe$text_column) ) >4. General question: How to speed up string operations on ‘large' data >sets? > > >Please let me know what more information you need in order to reproduce >this example? >It’s more a general type of question, while I think the description >above gives you a specific picture of what I’m doing right now. > > > > > > >General question: >Am 05.11.2013 um 06:59 schrieb Jeff Newmiller ><jdnew...@dcn.davis.ca.us>: > >> Example not reproducible. Communication fail. Please refer to Posting >Guide. >> >--------------------------------------------------------------------------- >> Jeff Newmiller The ..... ..... Go >Live... >> DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live >Go... >> Live: OO#.. Dead: OO#.. >Playing >> Research Engineer (Solar/Batteries O.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. >rocks...1k >> >--------------------------------------------------------------------------- > >> Sent from my phone. Please excuse my brevity. >> >> Simon Pickert <simon.pick...@t-online.de> wrote: >>> Hi R’lers, >>> >>> I’m running into speeding issues, performing a bunch of >>> >>> „gsub(patternvector, [token],dataframe$text_column)" >>> >>> on a data frame containing >4millionentries. >>> >>> (The “patternvectors“ contain up to 500 elements) >>> >>> Is there any better/faster way than performing like 20 gsub commands >in >>> a row? >>> >>> >>> Thanks! >>> Simon >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.