How’s that not reproducible? 1. Data frame, one column with text strings 2. Size of data frame= 4million observations 3. A bunch of gsubs in a row ( gsub(patternvector, “[token]“,dataframe$text_column) ) 4. General question: How to speed up string operations on ‘large' data sets?
Please let me know what more information you need in order to reproduce this example? It’s more a general type of question, while I think the description above gives you a specific picture of what I’m doing right now. General question: Am 05.11.2013 um 06:59 schrieb Jeff Newmiller <jdnew...@dcn.davis.ca.us>: > Example not reproducible. Communication fail. Please refer to Posting Guide. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > Simon Pickert <simon.pick...@t-online.de> wrote: >> Hi R’lers, >> >> I’m running into speeding issues, performing a bunch of >> >> „gsub(patternvector, [token],dataframe$text_column)" >> >> on a data frame containing >4millionentries. >> >> (The “patternvectors“ contain up to 500 elements) >> >> Is there any better/faster way than performing like 20 gsub commands in >> a row? >> >> >> Thanks! >> Simon >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.