Greetings, I've been struggling for some time with a problem concerning a big database that i have to deal with. I'll try to exemplify my problem since the database is really big. Suppose I have the following data:
AA = c(4,4,4,2,2,6,8,9) A1 = c(3,3,5,5,5,7,11,12) A2 = c(3,3,5,5,5,7,11,12) A = cbind(A, A1, A2) BB = c(2,2,4,6,6) B1 =c(5,11,7,13,NA) B2 =c(3,12,11,NA,NA) B3 =c(12,13,NA,NA,NA) B=cbind(BB,B1,B2,B3) I have to do the following: 1. Create a dummy (binary) variable in a new column of A that indicates if, for each row: a) the value from the column AA can be found in BB b) within the lines of B that corresponds to the value of AA, I can find both A1 and A2 in B1, B2 or B3. In this example i would have [0,0,1,1,1,0,0,0] I been able to do it with some loops; the problem is that since in the original data A has 2.936.044 lines and B has 14.965 it's taking forever to finish (probably because I might be doing the wrong way). I would really appreciate any help or advice on how to deal with this. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Big-data-and-column-correspondence-problem-tp3694912p3694912.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.