Hi: Here's one approach:
strings <- c( "A5.Brands.bought...Dulux", "A5.Brands.bought...Haymes", "A5.Brands.bought...Solver", "A5.Brands.bought...Taubmans.or.Bristol", "A5.Brands.bought...Wattyl", "A5.Brands.bought...Other") slist <- strsplit(strings, '\\.\\.\\.') # Conversion to data frame: library(plyr) ldply(slist, rbind) V1 V2 1 A5.Brands.bought Dulux 2 A5.Brands.bought Haymes 3 A5.Brands.bought Solver 4 A5.Brands.bought Taubmans.or.Bristol 5 A5.Brands.bought Wattyl 6 A5.Brands.bought Other # Conversion to matrix: laply(slist, rbind) do.call(rbind, slist) ...and one can subselect from there. HTH, Dennis On Tue, Apr 12, 2011 at 9:07 PM, Chris Howden <ch...@trickysolutions.com.au>wrote: > Hi Everyone, > > > I needed to parse some strings recently. > > The code I've wound up using seems rather clunky, and I was wondering if > anyone had any suggestions on a better way? > > Basically I do the following: > > 1) Use substr() to do the parsing > 2) Use regexpr() to find the location of the string I want to parse on, I > then pass this onto substr() > 3) Use nchar() as the stop input to substr() where necessary > > > > I've got a simple example of the parsing code I used below. It takes > questionnaire variable names that includes the question and the brand it > was answered for and then parses it so the variable name and the brand are > in separate columns. I then use this to restructure the data from > unstacked to stacked, but that's another story. > > > # this is the data set > > test > [1] "A5.Brands.bought...Dulux" > [2] "A5.Brands.bought...Haymes" > [3] "A5.Brands.bought...Solver" > [4] "A5.Brands.bought...Taubmans.or.Bristol" > [5] "A5.Brands.bought...Wattyl" > [6] "A5.Brands.bought...Other" > > > # Where do I want to parse? > > break1 <- regexpr('...',test, fixed=TRUE) > > break1 > [1] 17 17 17 17 17 17 > attr(,"match.length") > [1] 3 3 3 3 3 3 > > > # Put Variable name in a variable > > str1 <- substr(test,1,break1-1) > > str1 > [1] "A5.Brands.bought" "A5.Brands.bought" "A5.Brands.bought" > "A5.Brands.bought" > [5] "A5.Brands.bought" "A5.Brands.bought" > > > # Put Brand name in a variable > > str2 <- substr(test,break1+3, nchar(test)) > > str2 > [1] "Dulux" "Haymes" "Solver" > [4] "Taubmans.or.Bristol" "Wattyl" "Other" > > > > Thanks for any and all suggestions > > > Chris Howden > Founding Partner > Tricky Solutions > Tricky Solutions 4 Tricky Problems > Evidence Based Strategic Development, IP Commercialisation and Innovation, > Data Analysis, Modelling and Training > (mobile) 0410 689 945 > (fax / office) (+618) 8952 7878 > ch...@trickysolutions.com.au > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.