On Mon, Nov 11, 2013 at 8:04 PM, Lopez, Dan <lopez...@llnl.gov> wrote: > Below is how I am currently doing this. Is there a more efficient way to do > this? > The scenario is that I have two dataframes of different sizes. I need to > update one binary factor variable in one of those dataframes by matching on > two variables. If there is no match keep as is otherwise update. Also the > variable being update, TT in this case should remain a binary factor variable > (levels='HC','TER') > > HTDF2<-merge(H_DF,T_DF,by=c("FY","ID"),all.x=T) > HTDF2$TT<-factor(ifelse(is.na(HTDF2$TT.y),HTDF2$TT.x,HTDF2$TT.y),labels=c("HC","TER")) > HTDF2<-HTDF2[,-(3:4)] > > > # REPRODUCIBLE EXAMPLE DATA FOR ABOVE.. >> dput(H_DF) > structure(list(FY = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, > 5L), .Label = c("FY09", "FY10", "FY11", "FY12", "FY13"), class = "factor"), > ID = c(1, 1, 1, 1, 2, 2, 2, 2, 2), TT = structure(c(1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("HC", "TER"), class = "factor")), > .Names = c("FY", > "ID", "TT"), class = "data.frame", row.names = c(1L, 2L, 3L, > 4L, 6L, 7L, 9L, 10L, 11L)) >> dput(T_DF) > structure(list(FY = structure(c(4L, 2L, 5L), .Label = c("FY09", > "FY10", "FY11", "FY12", "FY13"), class = "factor"), ID = c(1, > 2, 2), TT = structure(c(2L, 2L, 2L), .Label = c("HC", "TER"), class = > "factor")), .Names = c("FY", > "ID", "TT"), row.names = c(5L, 8L, 12L), class = "data.frame") >
Here is an sqldf solution: > library(sqldf) > sqldf("select FY, ID, coalesce(t.TT, h.TT) TT from H_DF h left join T_DF t > using(FY, ID)") FY ID TT 1 FY09 1 HC 2 FY10 1 HC 3 FY11 1 HC 4 FY12 1 TER 5 FY09 2 HC 6 FY10 2 TER 7 FY11 2 HC 8 FY12 2 HC 9 FY13 2 TER -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.