> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg > Sent: Thursday, October 22, 2009 7:35 PM > To: r-help > Subject: [R] splitting a vector of strings... > > Quick question -- if I have a vector of strings that I'd like > to split > into two new vectors based on a substring that is inside of > each string, > what is the most efficient way to do this? The substring > that I want to > split on is multiple characters, if that matters, and it is > contained in > every element of the character vector.
strsplit and sub can both be used for this. If you know the string will be split into 2 parts then 2 calls to sub with slightly different patterns will do it. strsplit requires less fiddling with the pattern and is handier when the number of parts is variable or large. strsplit's output often needs to be rearranged for convenient use. E.g., I made 100,000 strings with a 'qaz' in their middles with x<-paste("X",sample(1e5),sep="") y<-sub("X","Y",x) xy<-paste(x,y,sep="qaz") and split them by the 'qaz' in two ways: system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy))) # user system elapsed # 0.22 0.00 0.21 system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`, 1)),y=unlist(lapply(tmp,`[`,2)))}) user system elapsed # 2.42 0.00 2.20 identical(ret1,ret2) #[1] TRUE identical(ret1$x,x) && identical(ret1$y,y) #[1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > --j > > -- > > Jonathan A. Greenberg, PhD > Postdoctoral Scholar > Center for Spatial Technologies and Remote Sensing (CSTARS) > University of California, Davis > One Shields Avenue > The Barn, Room 250N > Davis, CA 95616 > Phone: 415-763-5476 > AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.