Re: [R] Problems using unique function and !duplicated
Hi Jon, I think you made a mistake in your desired output. If it is indeed a mistake, then this should do: test[!duplicated(test[,c(date,var2)]),] HTH, Ivan PS: think about dput() when you want to share objects, in this case dput(test) Le 2/28/2011 16:51, JonC a écrit : Hi, I am trying to simultaneously remove duplicate variables from two or more variables in a small R data.frame. I am trying to reproduce the SAS statements from a Proc Sort with Nodupkey for those familiar with SAS. Here's my example data : test- read.csv(test.csv, sep=,, as.is=TRUE) test date var1 var2 num1 num2 1 28/01/11a1 213 71 2 28/01/11b1 141 47 3 28/01/11c2 867 289 4 29/01/11a2 234 78 5 29/01/11b2 666 222 6 29/01/11c2 912 304 7 30/01/11a3 417 139 8 30/01/11b3 108 36 9 30/01/11c2 288 96 I am trying to obtain the following, where duplicates of date AND var2 are removed from the above data.frame. datevar1var2num1num2 28/01/2011 a 1 21371 28/01/2011 c 2 867289 29/01/2011 a 2 23478 30/01/2011 c 2 28896 30/01/2011 a 3 417139 If I use the !duplicated function with one variable everything works fine. However I wish to remove duplicates of both Date and var2. test[!duplicated(test$date),] date var1 var2 num1 num2 1 0011-01-28a1 213 71 4 0011-01-29a2 234 78 7 0011-01-30a3 417 139 test2- test[!duplicated(test$date),!duplicated(test$var2),] Error in `[.data.frame`(test, !duplicated(test$date), !duplicated(test$var2), : undefined columns selected I get an error ? I got different errors when using the unique() function. Can anybody solve this ? Thanks in advance. Jon -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems using unique function and !duplicated
Jon, you need to combine the conditions into one logical value, e.g. cond1 cond2, e.g. !duplicated(test$date) !duplicated(test$var2) However, I doubt that this is what you want: you remove too many rows (rows whose single values appeared already, even if the combination is unique). Have a look at the wiki, though: http://rwiki.sciviews.org/doku.php?id=tips:data-frames:count_and_extract_unique_rows Claudia On 02/28/2011 04:51 PM, JonC wrote: Hi, I am trying to simultaneously remove duplicate variables from two or more variables in a small R data.frame. I am trying to reproduce the SAS statements from a Proc Sort with Nodupkey for those familiar with SAS. Here's my example data : test- read.csv(test.csv, sep=,, as.is=TRUE) test date var1 var2 num1 num2 1 28/01/11a1 213 71 2 28/01/11b1 141 47 3 28/01/11c2 867 289 4 29/01/11a2 234 78 5 29/01/11b2 666 222 6 29/01/11c2 912 304 7 30/01/11a3 417 139 8 30/01/11b3 108 36 9 30/01/11c2 288 96 I am trying to obtain the following, where duplicates of date AND var2 are removed from the above data.frame. datevar1var2num1num2 28/01/2011 a 1 21371 28/01/2011 c 2 867289 29/01/2011 a 2 23478 30/01/2011 c 2 28896 30/01/2011 a 3 417139 If I use the !duplicated function with one variable everything works fine. However I wish to remove duplicates of both Date and var2. test[!duplicated(test$date),] date var1 var2 num1 num2 1 0011-01-28a1 213 71 4 0011-01-29a2 234 78 7 0011-01-30a3 417 139 test2- test[!duplicated(test$date),!duplicated(test$var2),] Error in `[.data.frame`(test, !duplicated(test$date), !duplicated(test$var2), : undefined columns selected I get an error ? I got different errors when using the unique() function. Can anybody solve this ? Thanks in advance. Jon -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems using unique function and !duplicated
On 28-Feb-11 15:51:17, JonC wrote: Hi, I am trying to simultaneously remove duplicate variables from two or more variables in a small R data.frame. I am trying to reproduce the SAS statements from a Proc Sort with Nodupkey for those familiar with SAS. Here's my example data : test - read.csv(test.csv, sep=,, as.is=TRUE) test date var1 var2 num1 num2 1 28/01/11a1 213 71 2 28/01/11b1 141 47 3 28/01/11c2 867 289 4 29/01/11a2 234 78 5 29/01/11b2 666 222 6 29/01/11c2 912 304 7 30/01/11a3 417 139 8 30/01/11b3 108 36 9 30/01/11c2 288 96 I am trying to obtain the following, where duplicates of date AND var2 are removed from the above data.frame. date var1var2num1num2 28/01/2011a 1 21371 28/01/2011c 2 867289 29/01/2011a 2 23478 30/01/2011c 2 28896 30/01/2011a 3 417139 If I use the !duplicated function with one variable everything works fine. However I wish to remove duplicates of both Date and var2. test[!duplicated(test$date),] date var1 var2 num1 num2 1 0011-01-28a1 213 71 4 0011-01-29a2 234 78 7 0011-01-30a3 417 139 test2 - test[!duplicated(test$date),!duplicated(test$var2),] Error in `[.data.frame`(test, !duplicated(test$date), !duplicated(test$var2), : undefined columns selected I got different errors when using the unique() function. Can anybody solve this ? Thanks in advance. Jon The following gives what you state you wish to obtain (though not quite in the same order of rows. Call the original dataframe 'df': df # date var1 var2 num1 num2 # 1 28/01/11a1 213 71 # 2 28/01/11b1 141 47 # 3 28/01/11c2 867 289 # 4 29/01/11a2 234 78 # 5 29/01/11b2 666 222 # 6 29/01/11c2 912 304 # 7 30/01/11a3 417 139 # 8 30/01/11b3 108 36 # 9 30/01/11c2 288 96 ix -which(duplicated(data.frame(df$date,df$var2))) ix # [1] 2 5 6 8 df[-ix,] # date var1 var2 num1 num2 # 1 28/01/11a1 213 71 # 3 28/01/11c2 867 289 # 4 29/01/11a2 234 78 # 7 30/01/11a3 417 139 # 9 30/01/11c2 288 96 Does this help? Ted. PS I'm posting this from a temporarily subscribed alternative address (for testing purposes) instead of my usual ted.hard...@wlandres.net E-Mail: (Ted Harding) e...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 28-Feb-11 Time: 16:19:59 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.