On Mon, Nov 11, 2013 at 8:04 PM, Lopez, Dan <lopez...@llnl.gov> wrote:
> Below is how I am currently doing this. Is there a more efficient way to do 
> this?
> The scenario is that I have two dataframes of different sizes. I need to 
> update one binary factor variable in one of those dataframes by matching on 
> two variables. If there is no match keep as is otherwise update. Also the 
> variable being update, TT in this case should remain a binary factor variable 
> (levels='HC','TER')
>
> HTDF2<-merge(H_DF,T_DF,by=c("FY","ID"),all.x=T)
> HTDF2$TT<-factor(ifelse(is.na(HTDF2$TT.y),HTDF2$TT.x,HTDF2$TT.y),labels=c("HC","TER"))
> HTDF2<-HTDF2[,-(3:4)]
>
>
> # REPRODUCIBLE EXAMPLE DATA FOR ABOVE..
>> dput(H_DF)
> structure(list(FY = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
> 5L), .Label = c("FY09", "FY10", "FY11", "FY12", "FY13"), class = "factor"),
>     ID = c(1, 1, 1, 1, 2, 2, 2, 2, 2), TT = structure(c(1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("HC", "TER"), class = "factor")), 
> .Names = c("FY",
> "ID", "TT"), class = "data.frame", row.names = c(1L, 2L, 3L,
> 4L, 6L, 7L, 9L, 10L, 11L))
>> dput(T_DF)
> structure(list(FY = structure(c(4L, 2L, 5L), .Label = c("FY09",
> "FY10", "FY11", "FY12", "FY13"), class = "factor"), ID = c(1,
> 2, 2), TT = structure(c(2L, 2L, 2L), .Label = c("HC", "TER"), class = 
> "factor")), .Names = c("FY",
> "ID", "TT"), row.names = c(5L, 8L, 12L), class = "data.frame")
>

Here is an sqldf solution:

> library(sqldf)
> sqldf("select FY, ID, coalesce(t.TT, h.TT) TT from H_DF h left join T_DF t 
> using(FY, ID)")
    FY ID  TT
1 FY09  1  HC
2 FY10  1  HC
3 FY11  1  HC
4 FY12  1 TER
5 FY09  2  HC
6 FY10  2 TER
7 FY11  2  HC
8 FY12  2  HC
9 FY13  2 TER


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to