Here is another way... for data analysis, the idiomatic result is usually more useful, though for presentation in a final result the wide result might be desired.
library(dplyr) library(tidyr) dat<-read.csv(text= "Year, Sex,string 2002,F,15 xc Ab 2003,F,14 2004,M,18 xb 25 35 21 2005,M,13 25 2006,M,14 ac 256 AV 35 2007,F,11" , header=TRUE ) idiomatic <- ( dat %>% mutate( string = strsplit( string, " " ) ) %>% unnest( cols = string ) %>% group_by( Year, Sex ) %>% mutate( s_name = paste0( "S", seq_along( string ) ) ) %>% ungroup() ) idiomatic # each row has unique Year, Sex, and s_name wide <- ( idiomatic %>% spread( s_name, string ) ) wide On July 19, 2024 11:23:48 AM PDT, Val <valkr...@gmail.com> wrote: >Thank you and sorry for the confusion. >The desired result should have 8 variables as a comma separated in >each line. The string variable is considered as one variable. >The output of your script is wfine for me. Thank you! > >On Fri, Jul 19, 2024 at 1:00 PM Ebert,Timothy Aaron <teb...@ufl.edu> wrote: >> >> The desired result is odd. >> 1) It looks like the string is duplicated in the desired result. The first >> line of data has "15, xc, Ab", and the desired result has "15, xc, Ab, 15, >> xc, Ab" >> 2) The example has S1 through S5, but the desired result has data for eight >> variables in the first line (not five). >> 3) The desired result has a different number of variables for each line. >> 4) Are you assuming that all missing data is at the end of the string? If >> there are 5 variables (S1 .... S5), do you know that "15, xc, Ab" is S1 = >> 15, S2 = 'xc', and S3 = 'Ab' rather than S2=15, S4='xc' and S5='Ab' ? >> >> This isn't exactly what you asked for, but maybe I was confused somewhere. >> This approach puts string data into variables in order. In this approach one >> mixes string and numeric data. The string is not duplicated. >> >> library(tidyr) >> >> dat <- read.csv(text="Year,Sex,string >> 2002,F,15 xc Ab >> 2003,F,14 >> 2004,M,18 xb 25 35 21 >> 2005,M,13 25 >> 2006,M,14 ac 256 AV 35 >> 2007,F,11", header=TRUE, stringsAsFactors=FALSE) >> >> # split the 'string' column based on spaces >> dat_separated <- dat |> >> separate(string, into = paste0("S", 1:5), sep = " ", >> fill = "right", extra = "merge") >> >> Tim >> >> >> -----Original Message----- >> From: R-help <r-help-boun...@r-project.org> On Behalf Of Val >> Sent: Friday, July 19, 2024 12:52 PM >> To: r-help@R-project.org (r-help@r-project.org) <r-help@r-project.org> >> Subject: [R] Extract >> >> [External Email] >> >> Hi All, >> >> I want to extract new variables from a string and add it to the dataframe. >> Sample data is csv file. >> >> dat<-read.csv(text="Year, Sex,string >> 2002,F,15 xc Ab >> 2003,F,14 >> 2004,M,18 xb 25 35 21 >> 2005,M,13 25 >> 2006,M,14 ac 256 AV 35 >> 2007,F,11",header=TRUE) >> >> The string column has a maximum of five variables. Some rows have all and >> others may not have all the five variables. If missing then fill it with >> NA, Desired result is shown below, >> >> >> Year,Sex,string, S1, S2, S3 S4,S5 >> 2002,F,15 xc Ab, 15,xc,Ab, NA, NA >> 2003,F,14, 14,NA,NA,NA,NA >> 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21 >> 2005,M,13 25,13, 25,NA,NA,NA >> 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35 >> 2007,F,11, 11,NA,NA,NA,NA >> >> Any help? >> Thank you in advance. >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.r-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.