Re: [R] Match strings across two differently sized dataframes and copy corresponding row to dataframe

2011-06-30 Thread jim holtman
?merge

On Thu, Jun 30, 2011 at 9:35 AM, Chris Beeley chris.bee...@gmail.com wrote:
 Hello-

 Sorry, this is a bit of a noob question, but I can't seem to progress
 it any further.

 I have two dataframes which contain a series of strings which exactly
 match. The problem is one has more rows than the other (more cases
 have been added) and they have been sorted so that they are not in the
 same order. The smaller dataframe, though, contains in another column
 which has codes classifying the strings.

 So, for every row of the larger dataframe, I want to look up the
 string in the smaller dataframe, and then use that row number to copy
 across the code for the string into the larger dataframe. Here's my
 idea so far:

 # comments is the smaller dataframe with the codes, mydata is the
 larger dataframe to which I would like to copy it.

 commvec=charmatch(comments$ImproveOne, mydata$Improve)  # this is the
 match between the strings one way
 datavec=charmatch(mydata$Improve, comments$ImproveOne) # this is the
 match the other way

 mydata$ImproveCat1=NA # produce a variable to hold the copied codes

 mydata$ImproveCat1[datavec[!is.na(datavec)]]=
 comments$ImproveCat[commvec[!is.na(commvec)]] # for all the non
 missing row numbers identified in the larger dataframe-
 # copy the corresponding code from the smaller dataframe (which lives
 in comments$ImproveCat

 However, the last command doesn't work because the variables are not
 the same length. They nearly are though, not sure if that's
 coincidence or shows I'm close

 length(mydata$ImproveCat1[datavec[!is.na(datavec)]]) # yields 1567

 length(comments$ImproveCat[commvec[!is.na(commvec)]]) # yields 1512

 I'm sorry, I did try to construct an example dataframe, but ironically
 I can't make that work either! Sorry!

 Any help gratefully received.

 Many thanks!

 Chris Beeley
 Institute of Mental Health, UK

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Match strings across two differently sized dataframes and copy corresponding row to dataframe

2011-06-30 Thread Chris Beeley
Hello-

Sorry, this is a bit of a noob question, but I can't seem to progress
it any further.

I have two dataframes which contain a series of strings which exactly
match. The problem is one has more rows than the other (more cases
have been added) and they have been sorted so that they are not in the
same order. The smaller dataframe, though, contains in another column
which has codes classifying the strings.

So, for every row of the larger dataframe, I want to look up the
string in the smaller dataframe, and then use that row number to copy
across the code for the string into the larger dataframe. Here's my
idea so far:

# comments is the smaller dataframe with the codes, mydata is the
larger dataframe to which I would like to copy it.

commvec=charmatch(comments$ImproveOne, mydata$Improve)  # this is the
match between the strings one way
datavec=charmatch(mydata$Improve, comments$ImproveOne) # this is the
match the other way

mydata$ImproveCat1=NA # produce a variable to hold the copied codes

mydata$ImproveCat1[datavec[!is.na(datavec)]]=
comments$ImproveCat[commvec[!is.na(commvec)]] # for all the non
missing row numbers identified in the larger dataframe-
# copy the corresponding code from the smaller dataframe (which lives
in comments$ImproveCat

However, the last command doesn't work because the variables are not
the same length. They nearly are though, not sure if that's
coincidence or shows I'm close

length(mydata$ImproveCat1[datavec[!is.na(datavec)]]) # yields 1567

length(comments$ImproveCat[commvec[!is.na(commvec)]]) # yields 1512

I'm sorry, I did try to construct an example dataframe, but ironically
I can't make that work either! Sorry!

Any help gratefully received.

Many thanks!

Chris Beeley
Institute of Mental Health, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.