Well, for a start, you might give us a reproducible example that actually runs -- yours doesn't. Did you check? You seem to b.e missing a final ")"(Also, you do not need to quote the column names in data.frame(), though it works fine also if you do).
Also note that in df, your id column has length 1000 and item1 column has length 100 and will be replicated to match id. This seems to me likely to be misspecified. Or is this what you meant? Finally, note that that none of the rows in df2 have identical numbers, so there is no id in df that can match both members of the row. So did you mean that the 'id' and 'item1' value in a row of df must match the corresponding 'a' and 'b' values of some row in df2 ? Under the above interpretation and with the following reprex(note the use of set.seed() to make it reproducible)... set.seed(1234) ## for reproducibility df=data.frame(id=c(10,rep(1:10, each=10), item1=sample(1:20, 100, replace=T)) df2=data.frame(a=c(8, 8,10,9, 5, 1,2,1), b=c(16,18,11, 19,18, 11,17,12)) ... you can paste0() the column vectors together in each data frame(as character values) and then just match on the single character vectors, like this: both <- match(do.call(paste0,df), do.call(paste0, df2)) ## subscript the data frames if you have more columns that are not used for matching > both [1] NA NA 8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [21] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [41] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [61] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [81] NA NA NA NA NA NA NA NA NA NA 3 NA NA NA NA NA NA NA NA 3 Then > both[!is.na(both)] ## the rows of df2 that matched [1] 8 3 3 > which(!is.na(both)) ## the rows of df that were matched [1] 3 91 100 This should be very efficient, as hashing is used for matching. I think there is a slicker way to do this that combines the paste and match functions together (other than using merge() or the like, as has already been suggested). But I have forgotten the details or I may just be thinking of merge(), which may indeed be a better option. All presuming this is what you meant, of course, which it may not be. :-( Cheers, Bert On Fri, Oct 7, 2022 at 5:57 AM Marine Andersson <marine.anders...@ki.se> wrote: > Hi, > > If I have two datasets like this: > df=data.frame("id"=rep(1:10,10, each=10), "item1"=sample(1:20, 100, > replace=T) > df2=data.frame("a"=c(8, 8,10,9, 5, 1,2,1), "b"=c(16,18,11, 19,18, > 11,17,12)) > > How do I find out which ids in the df dataset that has a match for both > the numbers occuring in the same row in the df2 dataframe? In the output I > would like to get the matching id and the rownumber from the df2. > > Output something like this > Id Rownr > 2 1 > 5 1 > 7 4 > > My actual problem is more complex with even more columns to be matched and > the datasets are large, hence the solution needs to be efficient. > > Kind regards, > > > > > > N?r du skickar e-post till Karolinska Institutet (KI) inneb?r detta att KI > kommer att behandla dina personuppgifter. H?r finns information om hur KI > behandlar personuppgifter< > https://ki.se/medarbetare/integritetsskyddspolicy>. > > > Sending email to Karolinska Institutet (KI) will result in KI processing > your personal data. You can read more about KI's processing of personal > data here<https://ki.se/en/staff/data-protection-policy>. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.