Re: [R] recode according to specific sequence of characters within a string variable

2011-02-04 Thread Greg Snow
You can do this with regular expressions, since you want to extract specific 
values from the string I would suggest learning about the gsubfn package, it is 
a bit easier with gsubfn than with the other matching tools. 


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of D. Alain
> Sent: Friday, February 04, 2011 5:33 AM
> To: r-help@r-project.org
> Subject: [R] recode according to specific sequence of characters within
> a string variable
> 
> Dear R-List,
> 
> I have a dataframe with one column "name.of.report" containing
> character values, e.g.
> 
> 
> >df$name.of.report
> 
> "jeff_2001_teamx"
> "teamy_jeff_2002"
> "robert_2002_teamz"
> "mary_2002_teamz"
> "2003_mary_teamy"
> ...
> (i.e. the bit of interest is not always at same position)
> 
> Now I want to recode the column "name.of.report" into the variables
> "person", "year","team", like this
> 
> >new.df
> 
> "person"  "year"  "team"
> jeff   2001  x
> jeff   2002  y
> robert   2002  z
> mary    2002  z
> 
> I tried with grep()
> 
> df$person<-grep("jeff",df$name.of.report)
> 
> but of course it didn't exactly result in what I wanted to do. Could
> not find any solution via RSeek. Excuse me if it is a very silly
> question, but can anyone help me find a way out of this?
> 
> Thanks a lot
> 
> Alain
> 
> 
> 
> 
>   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recode according to specific sequence of characters within a string variable

2011-02-04 Thread Greg Snow
So you want to combine multiple columns back into a single column with the 
strings pasted together?  If that is correct then look at the paste and sprintf 
functions (use one or the other, not both).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Denis Kazakiewicz
> Sent: Friday, February 04, 2011 6:26 AM
> To: Marc Schwartz
> Cc: R-help
> Subject: Re: [R] recode according to specific sequence of characters
> within a string variable
> 
> Dear R people
> Could you please help
> I have similar but opposite question
> How to reshape data from DF.new  to  DF from example, Mark kindly
> provided?
> 
> Thank you
> Denis
> 
> On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote:
> > On Feb 4, 2011, at 6:32 AM, D. Alain wrote:
> >
> > > Dear R-List,
> > >
> > > I have a dataframe with one column "name.of.report" containing
> character values, e.g.
> > >
> > >
> > >> df$name.of.report
> > >
> > > "jeff_2001_teamx"
> > > "teamy_jeff_2002"
> > > "robert_2002_teamz"
> > > "mary_2002_teamz"
> > > "2003_mary_teamy"
> > > ...
> > > (i.e. the bit of interest is not always at same position)
> > >
> > > Now I want to recode the column "name.of.report" into the variables
> "person", "year","team", like this
> > >
> > >> new.df
> > >
> > > "person"  "year"  "team"
> > > jeff   2001  x
> > > jeff   2002  y
> > > robert   2002  z
> > > mary2002  z
> > >
> > > I tried with grep()
> > >
> > > df$person<-grep("jeff",df$name.of.report)
> > >
> > > but of course it didn't exactly result in what I wanted to do.
> Could not find any solution via RSeek. Excuse me if it is a very silly
> question, but can anyone help me find a way out of this?
> > >
> > > Thanks a lot
> > >
> > > Alain
> >
> >
> > There will be several approaches, all largely involving the use of
> ?regex. Here is one:
> >
> >
> > DF <- data.frame(name.of.report = c("jeff_2001_teamx",
> "teamy_jeff_2002",
> > "robert_2002_teamz",
> "mary_2002_teamz",
> > "2003_mary_teamy"))
> >
> > > DF
> >  name.of.report
> > 1   jeff_2001_teamx
> > 2   teamy_jeff_2002
> > 3 robert_2002_teamz
> > 4   mary_2002_teamz
> > 5   2003_mary_teamy
> >
> >
> > DF.new <- data.frame(person = gsub("[_0-9]|team.", "",
> DF$name.of.report),
> >  year = gsub(".*([0-9]{4}).*","\\1",
> DF$name.of.report),
> >  team = gsub(".*team(.).*","\\1",
> DF$name.of.report))
> >
> >
> > > DF.new
> >   person year team
> > 1   jeff 2001x
> > 2   jeff 2002y
> > 3 robert 2002z
> > 4   mary 2002z
> > 5   mary 2003y
> >
> >
> >
> > HTH,
> >
> > Marc Schwartz
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recode according to specific sequence of characters within a string variable

2011-02-04 Thread David Winsemius


On Feb 4, 2011, at 8:26 AM, Denis Kazakiewicz wrote:


Dear R people
Could you please help
I have similar but opposite question
How to reshape data from DF.new  to  DF from example, Mark kindly
provided?


Well, I don't think you want a random order, right? If what you are  
asking is for a single character element per line of dataframe then  
try this:


apply(df.new, 1, paste, collapse="_")

--
David.


Thank you
Denis

On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote:

On Feb 4, 2011, at 6:32 AM, D. Alain wrote:


Dear R-List,

I have a dataframe with one column "name.of.report" containing  
character values, e.g.




df$name.of.report


"jeff_2001_teamx"
"teamy_jeff_2002"
"robert_2002_teamz"
"mary_2002_teamz"
"2003_mary_teamy"
...
(i.e. the bit of interest is not always at same position)

Now I want to recode the column "name.of.report" into the  
variables "person", "year","team", like this



new.df


"person"  "year"  "team"
jeff   2001  x
jeff   2002  y
robert   2002  z
mary2002  z

I tried with grep()

df$person<-grep("jeff",df$name.of.report)

but of course it didn't exactly result in what I wanted to do.  
Could not find any solution via RSeek. Excuse me if it is a very  
silly question, but can anyone help me find a way out of this?


Thanks a lot

Alain



There will be several approaches, all largely involving the use of ? 
regex. Here is one:



DF <- data.frame(name.of.report = c("jeff_2001_teamx",  
"teamy_jeff_2002",
   "robert_2002_teamz",  
"mary_2002_teamz",

   "2003_mary_teamy"))


DF

name.of.report
1   jeff_2001_teamx
2   teamy_jeff_2002
3 robert_2002_teamz
4   mary_2002_teamz
5   2003_mary_teamy


DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF 
$name.of.report),
year = gsub(".*([0-9]{4}).*","\\1", DF 
$name.of.report),
team = gsub(".*team(.).*","\\1", DF 
$name.of.report))




DF.new

 person year team
1   jeff 2001x
2   jeff 2002y
3 robert 2002z
4   mary 2002z
5   mary 2003y



HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recode according to specific sequence of characters within a string variable

2011-02-04 Thread Denis Kazakiewicz
Dear R people
Could you please help
I have similar but opposite question
How to reshape data from DF.new  to  DF from example, Mark kindly
provided?

Thank you
Denis

On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote:
> On Feb 4, 2011, at 6:32 AM, D. Alain wrote:
> 
> > Dear R-List, 
> > 
> > I have a dataframe with one column "name.of.report" containing character 
> > values, e.g.
> > 
> > 
> >> df$name.of.report
> > 
> > "jeff_2001_teamx"
> > "teamy_jeff_2002"
> > "robert_2002_teamz"
> > "mary_2002_teamz"
> > "2003_mary_teamy"
> > ...
> > (i.e. the bit of interest is not always at same position)
> > 
> > Now I want to recode the column "name.of.report" into the variables 
> > "person", "year","team", like this
> > 
> >> new.df
> > 
> > "person"  "year"  "team"
> > jeff   2001  x
> > jeff   2002  y
> > robert   2002  z
> > mary2002  z
> > 
> > I tried with grep()
> > 
> > df$person<-grep("jeff",df$name.of.report)
> > 
> > but of course it didn't exactly result in what I wanted to do. Could not 
> > find any solution via RSeek. Excuse me if it is a very silly question, but 
> > can anyone help me find a way out of this?
> > 
> > Thanks a lot
> > 
> > Alain
> 
> 
> There will be several approaches, all largely involving the use of ?regex. 
> Here is one:
> 
> 
> DF <- data.frame(name.of.report = c("jeff_2001_teamx", "teamy_jeff_2002", 
> "robert_2002_teamz", "mary_2002_teamz", 
> "2003_mary_teamy"))
> 
> > DF
>  name.of.report
> 1   jeff_2001_teamx
> 2   teamy_jeff_2002
> 3 robert_2002_teamz
> 4   mary_2002_teamz
> 5   2003_mary_teamy
> 
> 
> DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF$name.of.report),
>  year = gsub(".*([0-9]{4}).*","\\1", DF$name.of.report),
>  team = gsub(".*team(.).*","\\1", DF$name.of.report))
> 
> 
> > DF.new
>   person year team
> 1   jeff 2001x
> 2   jeff 2002y
> 3 robert 2002z
> 4   mary 2002z
> 5   mary 2003y
> 
> 
> 
> HTH,
> 
> Marc Schwartz
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recode according to specific sequence of characters within a string variable

2011-02-04 Thread Marc Schwartz
Do you mean something like:

> with(DF.new, paste(person, year, paste("team", team, sep = ""), sep = "_"))
[1] "jeff_2001_teamx"   "jeff_2002_teamy"   "robert_2002_teamz"
[4] "mary_2002_teamz"   "mary_2003_teamy"  

?

See ?paste and ?with for more information, if so.

HTH,

Marc

On Feb 4, 2011, at 7:26 AM, Denis Kazakiewicz wrote:

> Dear R people
> Could you please help
> I have similar but opposite question
> How to reshape data from DF.new  to  DF from example, Mark kindly
> provided?
> 
> Thank you
> Denis
> 
> On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote:
>> On Feb 4, 2011, at 6:32 AM, D. Alain wrote:
>> 
>>> Dear R-List, 
>>> 
>>> I have a dataframe with one column "name.of.report" containing character 
>>> values, e.g.
>>> 
>>> 
 df$name.of.report
>>> 
>>> "jeff_2001_teamx"
>>> "teamy_jeff_2002"
>>> "robert_2002_teamz"
>>> "mary_2002_teamz"
>>> "2003_mary_teamy"
>>> ...
>>> (i.e. the bit of interest is not always at same position)
>>> 
>>> Now I want to recode the column "name.of.report" into the variables 
>>> "person", "year","team", like this
>>> 
 new.df
>>> 
>>> "person"  "year"  "team"
>>> jeff   2001  x
>>> jeff   2002  y
>>> robert   2002  z
>>> mary2002  z
>>> 
>>> I tried with grep()
>>> 
>>> df$person<-grep("jeff",df$name.of.report)
>>> 
>>> but of course it didn't exactly result in what I wanted to do. Could not 
>>> find any solution via RSeek. Excuse me if it is a very silly question, but 
>>> can anyone help me find a way out of this?
>>> 
>>> Thanks a lot
>>> 
>>> Alain
>> 
>> 
>> There will be several approaches, all largely involving the use of ?regex. 
>> Here is one:
>> 
>> 
>> DF <- data.frame(name.of.report = c("jeff_2001_teamx", "teamy_jeff_2002", 
>>"robert_2002_teamz", "mary_2002_teamz", 
>>"2003_mary_teamy"))
>> 
>>> DF
>> name.of.report
>> 1   jeff_2001_teamx
>> 2   teamy_jeff_2002
>> 3 robert_2002_teamz
>> 4   mary_2002_teamz
>> 5   2003_mary_teamy
>> 
>> 
>> DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF$name.of.report),
>> year = gsub(".*([0-9]{4}).*","\\1", DF$name.of.report),
>> team = gsub(".*team(.).*","\\1", DF$name.of.report))
>> 
>> 
>>> DF.new
>>  person year team
>> 1   jeff 2001x
>> 2   jeff 2002y
>> 3 robert 2002z
>> 4   mary 2002z
>> 5   mary 2003y
>> 
>> 
>> 
>> HTH,
>> 
>> Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recode according to specific sequence of characters within a string variable

2011-02-04 Thread Marc Schwartz

On Feb 4, 2011, at 6:32 AM, D. Alain wrote:

> Dear R-List, 
> 
> I have a dataframe with one column "name.of.report" containing character 
> values, e.g.
> 
> 
>> df$name.of.report
> 
> "jeff_2001_teamx"
> "teamy_jeff_2002"
> "robert_2002_teamz"
> "mary_2002_teamz"
> "2003_mary_teamy"
> ...
> (i.e. the bit of interest is not always at same position)
> 
> Now I want to recode the column "name.of.report" into the variables "person", 
> "year","team", like this
> 
>> new.df
> 
> "person"  "year"  "team"
> jeff   2001  x
> jeff   2002  y
> robert   2002  z
> mary2002  z
> 
> I tried with grep()
> 
> df$person<-grep("jeff",df$name.of.report)
> 
> but of course it didn't exactly result in what I wanted to do. Could not find 
> any solution via RSeek. Excuse me if it is a very silly question, but can 
> anyone help me find a way out of this?
> 
> Thanks a lot
> 
> Alain


There will be several approaches, all largely involving the use of ?regex. Here 
is one:


DF <- data.frame(name.of.report = c("jeff_2001_teamx", "teamy_jeff_2002", 
"robert_2002_teamz", "mary_2002_teamz", 
"2003_mary_teamy"))

> DF
 name.of.report
1   jeff_2001_teamx
2   teamy_jeff_2002
3 robert_2002_teamz
4   mary_2002_teamz
5   2003_mary_teamy


DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF$name.of.report),
 year = gsub(".*([0-9]{4}).*","\\1", DF$name.of.report),
 team = gsub(".*team(.).*","\\1", DF$name.of.report))


> DF.new
  person year team
1   jeff 2001x
2   jeff 2002y
3 robert 2002z
4   mary 2002z
5   mary 2003y



HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.