I think this does what you want: ## find idiv coloumns in x.HHu.map that don't exist in y.HHo.map x.HHu.map <- x.HHu.map[ c("HHid", "position", names(x.HHu.map)[ !names(x.HHu.map) %in% names(y.HHo.map)] )] ## merge, adding extra column from x.HHu.map zzz <- merge(y.HHo.map, x.HHu.map, by=c('HHid', 'position'), all=T) ## order by HHid zzz <- zzz[order(zzz$HHid),]
Best, Ista On Sun, Jul 12, 2015 at 10:45 AM, aldi <a...@dsgmail.wustl.edu> wrote: > Hi, > I have two sets of data x.HHu and y.HHo, rows are IDs and columns are > individuals. I do not know in advance indv or HHid, both of them will be > captured from the data. As the y.HHo set updates, y.HHo set has better > information then x.HHu set. Thus I want a merge where right set > overwrites left set info based on HHid, i.e. to overwrite x.HHu set with > y.HHo set but keep any extra info from the x.HHu set that is not present > in y.HHo set. > HHids will be complete based on z.map, with the corresponding positions. > I am having trouble with the part after this line: ### > ============================================+++++++++++++++++++++++++++ > I am thinking that I am creating new columns "position" "indv1" and > "indv2", but R is interpreting them as row information. > See the expected final table at the end. HHid is common, indv3 is from > x.HHu, and the rest position and indv1 and indv2 are from y.HHo > Any suggestions are appreciated. > Thank you in advance, > Aldi > > x.HHu<- data.frame( > HHid = c( 'HH1', 'HH2', 'HH3', 'HH4', 'HH5', 'HH10') > , indv1 = c( 2, 0, 2 , 0, 2, 0) > , indv2 = c( 0, NA, 2, 2, 2, 2) > , ind3 = c( 0, 0, 0, 0, 0, 0) > ) > ### the HHo data will be the top set to overwrite any HHu data, when > they exist, thinking that HHo are better than HHu results > ### when they are available > > y.HHo<-data.frame(HHid=c('HH1', 'HH2','HH5', 'HH3', 'HH10') > , indv1 = c(2, 0, 2, 0, NA) > , indv2 = c(0, 2, 2, 1, 2) > ) > > z.map<-data.frame(HHid = c('HH1', 'HH2', 'HH3', 'HH4', 'HH5', > 'HH6','HH8', 'HH7', 'HH9', 'HH10', 'HH11') > , position= c(10,20,30,42,55,66,81,75,92,101,111) > ) > ### see objects > x.HHu > y.HHo > z.map > ### now sort the map by position, this sorted map will be used to sort > finally all data > z.map<-z.map[with(z.map, order(position)), ] > z.map > > ### First I introduce position to both sets so I can sort them in > advance by position. > x.HHu.map <-merge( z.map, x.HHu, by='HHid', all=T) > x.HHu.map<-x.HHu.map[with(x.HHu.map, order(position)), ] > x.HHu.map > > y.HHo.map <-merge( z.map, y.HHo, by='HHid', all= T) > y.HHo.map<-y.HHo.map[with(y.HHo.map, order(position)), ] > y.HHo.map > > ### now merge HHu and HHo with the hope to overwrite the HHu set with > HHo wherever they overlap by column names. > zzz <- merge(x.HHu.map, y.HHo.map, by='HHid', all=T) > zzz > ### find common variable names in two sets > > commonNames <- names(x.impu.map)[which(colnames(x.impu.map) %in% > colnames(y.geno.map))] > > ## remove HHid wich is common for x and y, but work with the rest of columns > commonNames<-commonNames[-c(1)] > > ### ============================================+++++++++++++++++++++++++++ > for(i in 1:length(commonNames)){ > > print(commonNames[i]) > zzz$commonNames[i] <- NA > > print(paste("zzz","$",commonNames[i],".y",sep="")) > > zzz$commonNames[i] <- zzz[,paste(commonNames[i],".y",sep="")] > > ### paste(zzz$commonNames[i],".x",sep='') <- NULL; > ### paste(zzz$commonNames[i],".y",sep='') <- NULL; > > } > zzz > > The final expected set has to be: HHid is common, indv3 is from x.HHu, > and the rest position and indv1 and indv2 are from y.HHo > HHid position ind3 indv1 indv2 > 1 HH1 10 0 2 0 > 2 HH10 101 0 NA 2 > 3 HH11 111 NA NA NA > 4 HH2 20 0 0 2 > 5 HH3 30 0 0 1 > 6 HH4 42 0 NA NA > 7 HH5 55 0 2 2 > 8 HH6 66 NA NA NA > 9 HH7 75 NA NA NA > 10 HH8 81 NA NA NA > 11 HH9 92 NA NA NA > > -- > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.