Re: [R] splitting a vector of strings
Dear Eric I think you are looking for sub or gsub Without an example set of input and output I am not quite sure but you would need to define an expression which matches your separator (;) followed by any characters up to the end of line. If you have trouble with that then someone here will no doubt write the pattern for you but learning about regular expressions is well worthwhile On 21/07/2016 12:54, Eric Elguero wrote: Hi everybody, I have a vector of character strings. Each string has the same pattern and I want to split them in pieces and get a vector made of the first pieces of each string. The problem is that strsplit returns a list. All I found is uu<- matrix(unlist(strsplit(x,";")),ncol=3,byrow=T)[,1] where x is the vector ";" is the delimiting character and I know that each string will be cut in 3 pieces. That works for my problem but I would prefer a more elegant solution. Besides, it would not work if all the string didn't have the same number of pieces. does someone have a better solution? sorry if that topic was discussed recently. There is too much traffic on the r-help list, I cannot catch up. -- Michael http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting a vector of strings
Hi, I'm not sure about the more generalized solution, but how about this for a start. x <- c("a;b;c", "d;e", "foo;g;h;i") x #[1] "a;b;c" "d;e" "foo;g;h;i" sapply(strsplit(x, ";",fixed = TRUE), '[',1) #[1] "a" "d" "foo" If you want elegance then I suggest you take a look at the stringr package. https://cran.r-project.org/web/packages/stringr/index.html Cheers, Ben > On Jul 21, 2016, at 7:54 AM, Eric Elguero wrote: > > Hi everybody, > > I have a vector of character strings. > Each string has the same pattern and I want > to split them in pieces and get a vector made > of the first pieces of each string. > > The problem is that strsplit returns a list. > > All I found is > > uu<- matrix(unlist(strsplit(x,";")),ncol=3,byrow=T)[,1] > > where x is the vector ";" is the delimiting character > and I know that each string will be cut in 3 pieces. > > That works for my problem but I would prefer a > more elegant solution. Besides, it would not > work if all the string didn't have the same > number of pieces. > > does someone have a better solution? > > sorry if that topic was discussed recently. > There is too much traffic on the r-help list, > I cannot catch up. > > -- > Eric Elguero > > MIVEGEC. - UMR (CNRS/IRD/UM) 5290 > Maladies Infectieuses et Vecteurs, Génétique, Evolution et Contrôle > Institut de Recherche pour le Développement (IRD) > 911, Avenue Agropolis > BP 64501 > 34394 Montpellier Cedex 5, France > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org Report Gulf of Maine jellyfish sightings to jellyf...@bigelow.org or tweet them to #MaineJellies -- include date, time, and location, as well as any descriptive information such as size or type. Learn more at https://www.bigelow.org/research/srs/nick-record/nick-record-laboratory/mainejellies/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] splitting a vector of strings
Hi everybody, I have a vector of character strings. Each string has the same pattern and I want to split them in pieces and get a vector made of the first pieces of each string. The problem is that strsplit returns a list. All I found is uu<- matrix(unlist(strsplit(x,";")),ncol=3,byrow=T)[,1] where x is the vector ";" is the delimiting character and I know that each string will be cut in 3 pieces. That works for my problem but I would prefer a more elegant solution. Besides, it would not work if all the string didn't have the same number of pieces. does someone have a better solution? sorry if that topic was discussed recently. There is too much traffic on the r-help list, I cannot catch up. -- Eric Elguero MIVEGEC. - UMR (CNRS/IRD/UM) 5290 Maladies Infectieuses et Vecteurs, Génétique, Evolution et Contrôle Institut de Recherche pour le Développement (IRD) 911, Avenue Agropolis BP 64501 34394 Montpellier Cedex 5, France __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting a vector of strings...
the following works - double backslash to remove the "or" functionality of | in a regex. (Bill Dunlap showed that you don't need sapply for it to work) xs <- "this is | string" xsv <- paste(xs, 1:10) strsplit(xsv, "\\|") On Oct 23, 3:50 pm, Jonathan Greenberg wrote: > William et al: > > Thanks! I think I have a somewhat more complicated issue due to the > type of string I'm using -- the split is " | " (space pipe space) -- how > do I code that based on your sub code below? Using " | *" doesn't seem > to be working. Thanks! > > --j > > > > William Dunlap wrote: > >> -Original Message- > >> From: r-help-boun...@r-project.org > >> [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg > >> Sent: Thursday, October 22, 2009 7:35 PM > >> To: r-help > >> Subject: [R] splitting a vector of strings... > > >> Quick question -- if I have a vector of strings that I'd like > >> to split > >> into two new vectors based on a substring that is inside of > >> each string, > >> what is the most efficient way to do this? The substring > >> that I want to > >> split on is multiple characters, if that matters, and it is > >> contained in > >> every element of the character vector. > > > strsplit and sub can both be used for this. If you know > > the string will be split into 2 parts then 2 calls to sub > > with slightly different patterns will do it. strsplit requires > > less fiddling with the pattern and is handier when the number > > of parts is variable or large. strsplit's output often needs to > > be rearranged for convenient use. > > > E.g., I made 100,000 strings with a 'qaz' in their middles with > > x<-paste("X",sample(1e5),sep="") > > y<-sub("X","Y",x) > > xy<-paste(x,y,sep="qaz") > > and split them by the 'qaz' in two ways: > > system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy))) > > # user system elapsed > > # 0.22 0.00 0.21 > > > system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`, > > 1)),y=unlist(lapply(tmp,`[`,2)))}) > > user system elapsed > > # 2.42 0.00 2.20 > > identical(ret1,ret2) > > #[1] TRUE > > identical(ret1$x,x) && identical(ret1$y,y) > > #[1] TRUE > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > >> --j > > >> -- > > >> Jonathan A. Greenberg, PhD > >> Postdoctoral Scholar > >> Center for Spatial Technologies and Remote Sensing (CSTARS) > >> University of California, Davis > >> One Shields Avenue > >> The Barn, Room 250N > >> Davis, CA 95616 > >> Phone: 415-763-5476 > >> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307 > > >> __ > >> r-h...@r-project.org mailing list > >>https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >>http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > -- > > Jonathan A. Greenberg, PhD > Postdoctoral Scholar > Center for Spatial Technologies and Remote Sensing (CSTARS) > University of California, Davis > One Shields Avenue > The Barn, Room 250N > Davis, CA 95616 > Phone: 415-763-5476 > AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307 > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting a vector of strings...
William et al: Thanks! I think I have a somewhat more complicated issue due to the type of string I'm using -- the split is " | " (space pipe space) -- how do I code that based on your sub code below? Using " | *" doesn't seem to be working. Thanks! --j William Dunlap wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg Sent: Thursday, October 22, 2009 7:35 PM To: r-help Subject: [R] splitting a vector of strings... Quick question -- if I have a vector of strings that I'd like to split into two new vectors based on a substring that is inside of each string, what is the most efficient way to do this? The substring that I want to split on is multiple characters, if that matters, and it is contained in every element of the character vector. strsplit and sub can both be used for this. If you know the string will be split into 2 parts then 2 calls to sub with slightly different patterns will do it. strsplit requires less fiddling with the pattern and is handier when the number of parts is variable or large. strsplit's output often needs to be rearranged for convenient use. E.g., I made 100,000 strings with a 'qaz' in their middles with x<-paste("X",sample(1e5),sep="") y<-sub("X","Y",x) xy<-paste(x,y,sep="qaz") and split them by the 'qaz' in two ways: system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy))) # user system elapsed # 0.220.000.21 system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`, 1)),y=unlist(lapply(tmp,`[`,2)))}) user system elapsed # 2.420.002.20 identical(ret1,ret2) #[1] TRUE identical(ret1$x,x) && identical(ret1$y,y) #[1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com --j -- Jonathan A. Greenberg, PhD Postdoctoral Scholar Center for Spatial Technologies and Remote Sensing (CSTARS) University of California, Davis One Shields Avenue The Barn, Room 250N Davis, CA 95616 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jonathan A. Greenberg, PhD Postdoctoral Scholar Center for Spatial Technologies and Remote Sensing (CSTARS) University of California, Davis One Shields Avenue The Barn, Room 250N Davis, CA 95616 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting a vector of strings...
xs <- "this is string" xsv <- paste(xs, 1:10) sapply(xsv, function(x) strsplit(x, '\\sis\\s')) This will split the vector of string "xsv" on the word 'is' that has a space immediately before and after it. On Oct 23, 1:34 pm, Jonathan Greenberg wrote: > Quick question -- if I have a vector of strings that I'd like to split > into two new vectors based on a substring that is inside of each string, > what is the most efficient way to do this? The substring that I want to > split on is multiple characters, if that matters, and it is contained in > every element of the character vector. > > --j > > -- > > Jonathan A. Greenberg, PhD > Postdoctoral Scholar > Center for Spatial Technologies and Remote Sensing (CSTARS) > University of California, Davis > One Shields Avenue > The Barn, Room 250N > Davis, CA 95616 > Phone: 415-763-5476 > AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307 > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting a vector of strings...
> -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg > Sent: Thursday, October 22, 2009 7:35 PM > To: r-help > Subject: [R] splitting a vector of strings... > > Quick question -- if I have a vector of strings that I'd like > to split > into two new vectors based on a substring that is inside of > each string, > what is the most efficient way to do this? The substring > that I want to > split on is multiple characters, if that matters, and it is > contained in > every element of the character vector. strsplit and sub can both be used for this. If you know the string will be split into 2 parts then 2 calls to sub with slightly different patterns will do it. strsplit requires less fiddling with the pattern and is handier when the number of parts is variable or large. strsplit's output often needs to be rearranged for convenient use. E.g., I made 100,000 strings with a 'qaz' in their middles with x<-paste("X",sample(1e5),sep="") y<-sub("X","Y",x) xy<-paste(x,y,sep="qaz") and split them by the 'qaz' in two ways: system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy))) # user system elapsed # 0.220.000.21 system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`, 1)),y=unlist(lapply(tmp,`[`,2)))}) user system elapsed # 2.420.002.20 identical(ret1,ret2) #[1] TRUE identical(ret1$x,x) && identical(ret1$y,y) #[1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > --j > > -- > > Jonathan A. Greenberg, PhD > Postdoctoral Scholar > Center for Spatial Technologies and Remote Sensing (CSTARS) > University of California, Davis > One Shields Avenue > The Barn, Room 250N > Davis, CA 95616 > Phone: 415-763-5476 > AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] splitting a vector of strings...
Quick question -- if I have a vector of strings that I'd like to split into two new vectors based on a substring that is inside of each string, what is the most efficient way to do this? The substring that I want to split on is multiple characters, if that matters, and it is contained in every element of the character vector. --j -- Jonathan A. Greenberg, PhD Postdoctoral Scholar Center for Spatial Technologies and Remote Sensing (CSTARS) University of California, Davis One Shields Avenue The Barn, Room 250N Davis, CA 95616 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.