Re: [R] recode according to specific sequence of characters within a string variable
You can do this with regular expressions, since you want to extract specific values from the string I would suggest learning about the gsubfn package, it is a bit easier with gsubfn than with the other matching tools. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of D. Alain > Sent: Friday, February 04, 2011 5:33 AM > To: r-help@r-project.org > Subject: [R] recode according to specific sequence of characters within > a string variable > > Dear R-List, > > I have a dataframe with one column "name.of.report" containing > character values, e.g. > > > >df$name.of.report > > "jeff_2001_teamx" > "teamy_jeff_2002" > "robert_2002_teamz" > "mary_2002_teamz" > "2003_mary_teamy" > ... > (i.e. the bit of interest is not always at same position) > > Now I want to recode the column "name.of.report" into the variables > "person", "year","team", like this > > >new.df > > "person" "year" "team" > jeff 2001 x > jeff 2002 y > robert 2002 z > mary 2002 z > > I tried with grep() > > df$person<-grep("jeff",df$name.of.report) > > but of course it didn't exactly result in what I wanted to do. Could > not find any solution via RSeek. Excuse me if it is a very silly > question, but can anyone help me find a way out of this? > > Thanks a lot > > Alain > > > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recode according to specific sequence of characters within a string variable
So you want to combine multiple columns back into a single column with the strings pasted together? If that is correct then look at the paste and sprintf functions (use one or the other, not both). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of Denis Kazakiewicz > Sent: Friday, February 04, 2011 6:26 AM > To: Marc Schwartz > Cc: R-help > Subject: Re: [R] recode according to specific sequence of characters > within a string variable > > Dear R people > Could you please help > I have similar but opposite question > How to reshape data from DF.new to DF from example, Mark kindly > provided? > > Thank you > Denis > > On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote: > > On Feb 4, 2011, at 6:32 AM, D. Alain wrote: > > > > > Dear R-List, > > > > > > I have a dataframe with one column "name.of.report" containing > character values, e.g. > > > > > > > > >> df$name.of.report > > > > > > "jeff_2001_teamx" > > > "teamy_jeff_2002" > > > "robert_2002_teamz" > > > "mary_2002_teamz" > > > "2003_mary_teamy" > > > ... > > > (i.e. the bit of interest is not always at same position) > > > > > > Now I want to recode the column "name.of.report" into the variables > "person", "year","team", like this > > > > > >> new.df > > > > > > "person" "year" "team" > > > jeff 2001 x > > > jeff 2002 y > > > robert 2002 z > > > mary2002 z > > > > > > I tried with grep() > > > > > > df$person<-grep("jeff",df$name.of.report) > > > > > > but of course it didn't exactly result in what I wanted to do. > Could not find any solution via RSeek. Excuse me if it is a very silly > question, but can anyone help me find a way out of this? > > > > > > Thanks a lot > > > > > > Alain > > > > > > There will be several approaches, all largely involving the use of > ?regex. Here is one: > > > > > > DF <- data.frame(name.of.report = c("jeff_2001_teamx", > "teamy_jeff_2002", > > "robert_2002_teamz", > "mary_2002_teamz", > > "2003_mary_teamy")) > > > > > DF > > name.of.report > > 1 jeff_2001_teamx > > 2 teamy_jeff_2002 > > 3 robert_2002_teamz > > 4 mary_2002_teamz > > 5 2003_mary_teamy > > > > > > DF.new <- data.frame(person = gsub("[_0-9]|team.", "", > DF$name.of.report), > > year = gsub(".*([0-9]{4}).*","\\1", > DF$name.of.report), > > team = gsub(".*team(.).*","\\1", > DF$name.of.report)) > > > > > > > DF.new > > person year team > > 1 jeff 2001x > > 2 jeff 2002y > > 3 robert 2002z > > 4 mary 2002z > > 5 mary 2003y > > > > > > > > HTH, > > > > Marc Schwartz > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recode according to specific sequence of characters within a string variable
On Feb 4, 2011, at 8:26 AM, Denis Kazakiewicz wrote: Dear R people Could you please help I have similar but opposite question How to reshape data from DF.new to DF from example, Mark kindly provided? Well, I don't think you want a random order, right? If what you are asking is for a single character element per line of dataframe then try this: apply(df.new, 1, paste, collapse="_") -- David. Thank you Denis On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote: On Feb 4, 2011, at 6:32 AM, D. Alain wrote: Dear R-List, I have a dataframe with one column "name.of.report" containing character values, e.g. df$name.of.report "jeff_2001_teamx" "teamy_jeff_2002" "robert_2002_teamz" "mary_2002_teamz" "2003_mary_teamy" ... (i.e. the bit of interest is not always at same position) Now I want to recode the column "name.of.report" into the variables "person", "year","team", like this new.df "person" "year" "team" jeff 2001 x jeff 2002 y robert 2002 z mary2002 z I tried with grep() df$person<-grep("jeff",df$name.of.report) but of course it didn't exactly result in what I wanted to do. Could not find any solution via RSeek. Excuse me if it is a very silly question, but can anyone help me find a way out of this? Thanks a lot Alain There will be several approaches, all largely involving the use of ? regex. Here is one: DF <- data.frame(name.of.report = c("jeff_2001_teamx", "teamy_jeff_2002", "robert_2002_teamz", "mary_2002_teamz", "2003_mary_teamy")) DF name.of.report 1 jeff_2001_teamx 2 teamy_jeff_2002 3 robert_2002_teamz 4 mary_2002_teamz 5 2003_mary_teamy DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF $name.of.report), year = gsub(".*([0-9]{4}).*","\\1", DF $name.of.report), team = gsub(".*team(.).*","\\1", DF $name.of.report)) DF.new person year team 1 jeff 2001x 2 jeff 2002y 3 robert 2002z 4 mary 2002z 5 mary 2003y HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recode according to specific sequence of characters within a string variable
Dear R people Could you please help I have similar but opposite question How to reshape data from DF.new to DF from example, Mark kindly provided? Thank you Denis On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote: > On Feb 4, 2011, at 6:32 AM, D. Alain wrote: > > > Dear R-List, > > > > I have a dataframe with one column "name.of.report" containing character > > values, e.g. > > > > > >> df$name.of.report > > > > "jeff_2001_teamx" > > "teamy_jeff_2002" > > "robert_2002_teamz" > > "mary_2002_teamz" > > "2003_mary_teamy" > > ... > > (i.e. the bit of interest is not always at same position) > > > > Now I want to recode the column "name.of.report" into the variables > > "person", "year","team", like this > > > >> new.df > > > > "person" "year" "team" > > jeff 2001 x > > jeff 2002 y > > robert 2002 z > > mary2002 z > > > > I tried with grep() > > > > df$person<-grep("jeff",df$name.of.report) > > > > but of course it didn't exactly result in what I wanted to do. Could not > > find any solution via RSeek. Excuse me if it is a very silly question, but > > can anyone help me find a way out of this? > > > > Thanks a lot > > > > Alain > > > There will be several approaches, all largely involving the use of ?regex. > Here is one: > > > DF <- data.frame(name.of.report = c("jeff_2001_teamx", "teamy_jeff_2002", > "robert_2002_teamz", "mary_2002_teamz", > "2003_mary_teamy")) > > > DF > name.of.report > 1 jeff_2001_teamx > 2 teamy_jeff_2002 > 3 robert_2002_teamz > 4 mary_2002_teamz > 5 2003_mary_teamy > > > DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF$name.of.report), > year = gsub(".*([0-9]{4}).*","\\1", DF$name.of.report), > team = gsub(".*team(.).*","\\1", DF$name.of.report)) > > > > DF.new > person year team > 1 jeff 2001x > 2 jeff 2002y > 3 robert 2002z > 4 mary 2002z > 5 mary 2003y > > > > HTH, > > Marc Schwartz > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recode according to specific sequence of characters within a string variable
Do you mean something like: > with(DF.new, paste(person, year, paste("team", team, sep = ""), sep = "_")) [1] "jeff_2001_teamx" "jeff_2002_teamy" "robert_2002_teamz" [4] "mary_2002_teamz" "mary_2003_teamy" ? See ?paste and ?with for more information, if so. HTH, Marc On Feb 4, 2011, at 7:26 AM, Denis Kazakiewicz wrote: > Dear R people > Could you please help > I have similar but opposite question > How to reshape data from DF.new to DF from example, Mark kindly > provided? > > Thank you > Denis > > On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote: >> On Feb 4, 2011, at 6:32 AM, D. Alain wrote: >> >>> Dear R-List, >>> >>> I have a dataframe with one column "name.of.report" containing character >>> values, e.g. >>> >>> df$name.of.report >>> >>> "jeff_2001_teamx" >>> "teamy_jeff_2002" >>> "robert_2002_teamz" >>> "mary_2002_teamz" >>> "2003_mary_teamy" >>> ... >>> (i.e. the bit of interest is not always at same position) >>> >>> Now I want to recode the column "name.of.report" into the variables >>> "person", "year","team", like this >>> new.df >>> >>> "person" "year" "team" >>> jeff 2001 x >>> jeff 2002 y >>> robert 2002 z >>> mary2002 z >>> >>> I tried with grep() >>> >>> df$person<-grep("jeff",df$name.of.report) >>> >>> but of course it didn't exactly result in what I wanted to do. Could not >>> find any solution via RSeek. Excuse me if it is a very silly question, but >>> can anyone help me find a way out of this? >>> >>> Thanks a lot >>> >>> Alain >> >> >> There will be several approaches, all largely involving the use of ?regex. >> Here is one: >> >> >> DF <- data.frame(name.of.report = c("jeff_2001_teamx", "teamy_jeff_2002", >>"robert_2002_teamz", "mary_2002_teamz", >>"2003_mary_teamy")) >> >>> DF >> name.of.report >> 1 jeff_2001_teamx >> 2 teamy_jeff_2002 >> 3 robert_2002_teamz >> 4 mary_2002_teamz >> 5 2003_mary_teamy >> >> >> DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF$name.of.report), >> year = gsub(".*([0-9]{4}).*","\\1", DF$name.of.report), >> team = gsub(".*team(.).*","\\1", DF$name.of.report)) >> >> >>> DF.new >> person year team >> 1 jeff 2001x >> 2 jeff 2002y >> 3 robert 2002z >> 4 mary 2002z >> 5 mary 2003y >> >> >> >> HTH, >> >> Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recode according to specific sequence of characters within a string variable
On Feb 4, 2011, at 6:32 AM, D. Alain wrote: > Dear R-List, > > I have a dataframe with one column "name.of.report" containing character > values, e.g. > > >> df$name.of.report > > "jeff_2001_teamx" > "teamy_jeff_2002" > "robert_2002_teamz" > "mary_2002_teamz" > "2003_mary_teamy" > ... > (i.e. the bit of interest is not always at same position) > > Now I want to recode the column "name.of.report" into the variables "person", > "year","team", like this > >> new.df > > "person" "year" "team" > jeff 2001 x > jeff 2002 y > robert 2002 z > mary2002 z > > I tried with grep() > > df$person<-grep("jeff",df$name.of.report) > > but of course it didn't exactly result in what I wanted to do. Could not find > any solution via RSeek. Excuse me if it is a very silly question, but can > anyone help me find a way out of this? > > Thanks a lot > > Alain There will be several approaches, all largely involving the use of ?regex. Here is one: DF <- data.frame(name.of.report = c("jeff_2001_teamx", "teamy_jeff_2002", "robert_2002_teamz", "mary_2002_teamz", "2003_mary_teamy")) > DF name.of.report 1 jeff_2001_teamx 2 teamy_jeff_2002 3 robert_2002_teamz 4 mary_2002_teamz 5 2003_mary_teamy DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF$name.of.report), year = gsub(".*([0-9]{4}).*","\\1", DF$name.of.report), team = gsub(".*team(.).*","\\1", DF$name.of.report)) > DF.new person year team 1 jeff 2001x 2 jeff 2002y 3 robert 2002z 4 mary 2002z 5 mary 2003y HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.