Thank you and sorry for the confusion. The desired result should have 8 variables as a comma separated in each line. The string variable is considered as one variable. The output of your script is wfine for me. Thank you!
On Fri, Jul 19, 2024 at 1:00 PM Ebert,Timothy Aaron <teb...@ufl.edu> wrote: > > The desired result is odd. > 1) It looks like the string is duplicated in the desired result. The first > line of data has "15, xc, Ab", and the desired result has "15, xc, Ab, 15, > xc, Ab" > 2) The example has S1 through S5, but the desired result has data for eight > variables in the first line (not five). > 3) The desired result has a different number of variables for each line. > 4) Are you assuming that all missing data is at the end of the string? If > there are 5 variables (S1 .... S5), do you know that "15, xc, Ab" is S1 = 15, > S2 = 'xc', and S3 = 'Ab' rather than S2=15, S4='xc' and S5='Ab' ? > > This isn't exactly what you asked for, but maybe I was confused somewhere. > This approach puts string data into variables in order. In this approach one > mixes string and numeric data. The string is not duplicated. > > library(tidyr) > > dat <- read.csv(text="Year,Sex,string > 2002,F,15 xc Ab > 2003,F,14 > 2004,M,18 xb 25 35 21 > 2005,M,13 25 > 2006,M,14 ac 256 AV 35 > 2007,F,11", header=TRUE, stringsAsFactors=FALSE) > > # split the 'string' column based on spaces > dat_separated <- dat |> > separate(string, into = paste0("S", 1:5), sep = " ", > fill = "right", extra = "merge") > > Tim > > > -----Original Message----- > From: R-help <r-help-boun...@r-project.org> On Behalf Of Val > Sent: Friday, July 19, 2024 12:52 PM > To: r-help@R-project.org (r-help@r-project.org) <r-help@r-project.org> > Subject: [R] Extract > > [External Email] > > Hi All, > > I want to extract new variables from a string and add it to the dataframe. > Sample data is csv file. > > dat<-read.csv(text="Year, Sex,string > 2002,F,15 xc Ab > 2003,F,14 > 2004,M,18 xb 25 35 21 > 2005,M,13 25 > 2006,M,14 ac 256 AV 35 > 2007,F,11",header=TRUE) > > The string column has a maximum of five variables. Some rows have all and > others may not have all the five variables. If missing then fill it with NA, > Desired result is shown below, > > > Year,Sex,string, S1, S2, S3 S4,S5 > 2002,F,15 xc Ab, 15,xc,Ab, NA, NA > 2003,F,14, 14,NA,NA,NA,NA > 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21 > 2005,M,13 25,13, 25,NA,NA,NA > 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35 > 2007,F,11, 11,NA,NA,NA,NA > > Any help? > Thank you in advance. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.r-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.