Thomas, You are very clever! The "meil2" data frame has twice the common variable combinations:
> meil2 dist sexe style meil 1 38 F clas 02:43:17 2 38 F free 02:24:46 3 38 H clas 02:37:36 4 38 H free 01:59:35 5 45 F clas 03:46:15 6 45 F free 02:20:15 7 45 H clas 02:30:07 8 45 H free 01:59:36 9 38 F clas 02:43:17 10 38 F free 02:24:46 11 38 H clas 02:37:36 12 38 H free 01:59:35 13 45 F clas 03:46:15 14 45 F free 02:20:15 15 45 H clas 02:30:07 16 45 H free 01:59:36 Keeping unique combinations merged correctly with the next data frame. This merge() function is more subtle than I first thought. That means when merging two data frames, if the resulting data frame has more rows than either former data frames, it means that there are duplicate combinations of the common variables in either or the two data frames. Thank you very much, I will try to be more careful about this. Rock Thomas Lumley wrote: > > On Fri, 8 May 2009, Rock Ouimet wrote: > >> I am new to R (ex SAS user) , and I cannot merge two data frames without >> getting duplicated rows in the results. How to avoid this happening >> without >> using the unique() function? >> >> 1. First data frame is called "tmv" with 6 variables and 239 rows: >> >>> tmv[1:10,] >> temps nom prenom sexe dist style >> 1 01:59:36 Cyr Steve H 45 free >> 2 02:09:55 Gosselin Erick H 45 free >> 3 02:12:18 Desfosses Sacha H 45 free >> 4 02:12:23 Lapointe Sebastien H 45 free >> 5 02:12:52 Labrie Michel H 45 free >> 6 02:12:54 Leblanc Michel H 45 free >> 7 02:13:02 Thibeault Sylvain H 45 free >> 8 02:13:49 Martel Stephane H 45 free >> 9 02:14:03 Lavoie Jean-Philippe H 45 free >> 10 02:14:05 Boivin Jean-Claude H 45 free >> >> Its structure is: >>> str(tmv) >> 'data.frame': 239 obs. of 6 variables: >> $ temps :Class 'times' atomic [1:239] 0.0831 0.0902 0.0919 0.0919 0.0923 >> ... >> .. ..- attr(*, "format")= chr "h:m:s" >> $ nom : Factor w/ 167 levels "Aubut","Audy",..: 45 84 55 105 98 110 158 >> 117 109 22 ... >> $ prenom: Factor w/ 135 levels "Alain","Alexandre",..: 128 33 121 122 93 >> 93 >> 130 126 63 59 ... >> $ sexe : Factor w/ 2 levels "F","H": 2 2 2 2 2 2 2 2 2 2 ... >> $ dist : int 45 45 45 45 45 45 45 45 45 45 ... >> $ style : Factor w/ 2 levels "clas","free": 2 2 2 2 2 2 2 2 2 2 ... >> >> >> 2. The second data frame is called "meil2" with 4 variables and 16 rows; >>> meil2[1:10,] >> dist sexe style meil >> 1 38 F clas 02:43:17 >> 2 38 F free 02:24:46 >> 3 38 H clas 02:37:36 >> 4 38 H free 01:59:35 >> 5 45 F clas 03:46:15 >> 6 45 F free 02:20:15 >> 7 45 H clas 02:30:07 >> 8 45 H free 01:59:36 >> 9 38 F clas 02:43:17 >> 10 38 F free 02:24:46 > > > Lines 9 and 1 appear to be the same in meil2, as do 2 and 10. If the 16 > rows consist of two repeats of 8 rows that would explain why you are > getting two copies of each individual in the output. unique(meil2) would > have just the distinct rows. > > -thomas > > Thomas Lumley Assoc. Professor, Biostatistics > tlum...@u.washington.edu University of Washington, Seattle > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Merging-two-data-frames-with-3-common-variables-makes-duplicated-rows-tp23454018p23459790.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.