Re: [R] Problems using unique function and !duplicated

2011-02-28 Thread Ivan Calandra

Hi Jon,

I think you made a mistake in your desired output.
If it is indeed a mistake, then this should do:

test[!duplicated(test[,c(date,var2)]),]

HTH,
Ivan

PS: think about dput() when you want to share objects, in this case 
dput(test)



Le 2/28/2011 16:51, JonC a écrit :

Hi, I am trying to simultaneously remove duplicate variables from two or more
variables in a small R data.frame. I am trying to reproduce the SAS
statements from a Proc Sort with Nodupkey for those familiar with SAS.

Here's my example data :

test- read.csv(test.csv, sep=,, as.is=TRUE)

test

   date var1 var2 num1 num2
1 28/01/11a1  213   71
2 28/01/11b1  141   47
3 28/01/11c2  867  289
4 29/01/11a2  234   78
5 29/01/11b2  666  222
6 29/01/11c2  912  304
7 30/01/11a3  417  139
8 30/01/11b3  108   36
9 30/01/11c2  288   96

I am trying to obtain the following, where duplicates of date AND var2 are
removed from the above data.frame.

datevar1var2num1num2
28/01/2011  a   1   21371
28/01/2011  c   2   867289
29/01/2011  a   2   23478
30/01/2011  c   2   28896
30/01/2011  a   3   417139



If I use the !duplicated function with one variable everything works fine.
However I wish to remove duplicates of both Date and var2.

  test[!duplicated(test$date),]
 date var1 var2 num1 num2
1 0011-01-28a1  213   71
4 0011-01-29a2  234   78
7 0011-01-30a3  417  139

test2- test[!duplicated(test$date),!duplicated(test$var2),]
Error in `[.data.frame`(test, !duplicated(test$date),
!duplicated(test$var2),  :   undefined columns selected

I get an error ?
I got different errors when using the unique() function.

Can anybody solve this ?

Thanks in advance.

Jon




--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems using unique function and !duplicated

2011-02-28 Thread Claudia Beleites

Jon,

you need to combine the conditions into one logical value, e.g. cond1  cond2, 
e.g. !duplicated(test$date)  !duplicated(test$var2)


However, I doubt that this is what you want: you remove too many rows (rows 
whose single values appeared already, even if the combination is unique).


Have a look at the wiki, though: 
http://rwiki.sciviews.org/doku.php?id=tips:data-frames:count_and_extract_unique_rows


Claudia


On 02/28/2011 04:51 PM, JonC wrote:

Hi, I am trying to simultaneously remove duplicate variables from two or more
variables in a small R data.frame. I am trying to reproduce the SAS
statements from a Proc Sort with Nodupkey for those familiar with SAS.

Here's my example data :

test- read.csv(test.csv, sep=,, as.is=TRUE)

test

   date var1 var2 num1 num2
1 28/01/11a1  213   71
2 28/01/11b1  141   47
3 28/01/11c2  867  289
4 29/01/11a2  234   78
5 29/01/11b2  666  222
6 29/01/11c2  912  304
7 30/01/11a3  417  139
8 30/01/11b3  108   36
9 30/01/11c2  288   96

I am trying to obtain the following, where duplicates of date AND var2 are
removed from the above data.frame.

datevar1var2num1num2
28/01/2011  a   1   21371
28/01/2011  c   2   867289
29/01/2011  a   2   23478
30/01/2011  c   2   28896
30/01/2011  a   3   417139



If I use the !duplicated function with one variable everything works fine.
However I wish to remove duplicates of both Date and var2.

  test[!duplicated(test$date),]
 date var1 var2 num1 num2
1 0011-01-28a1  213   71
4 0011-01-29a2  234   78
7 0011-01-30a3  417  139

test2- test[!duplicated(test$date),!duplicated(test$var2),]
Error in `[.data.frame`(test, !duplicated(test$date),
!duplicated(test$var2),  :   undefined columns selected

I get an error ?
I got different errors when using the unique() function.

Can anybody solve this ?

Thanks in advance.

Jon





--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems using unique function and !duplicated

2011-02-28 Thread Ted Harding
On 28-Feb-11 15:51:17, JonC wrote:
 Hi, I am trying to simultaneously remove duplicate variables from two
 or more
 variables in a small R data.frame. I am trying to reproduce the SAS
 statements from a Proc Sort with Nodupkey for those familiar with SAS. 
 
 Here's my example data : 
 
 test - read.csv(test.csv, sep=,, as.is=TRUE)
 test
   date var1 var2 num1 num2
 1 28/01/11a1  213   71
 2 28/01/11b1  141   47
 3 28/01/11c2  867  289
 4 29/01/11a2  234   78
 5 29/01/11b2  666  222
 6 29/01/11c2  912  304
 7 30/01/11a3  417  139
 8 30/01/11b3  108   36
 9 30/01/11c2  288   96
 
 I am trying to obtain the following, where duplicates of date AND var2
 are removed from the above data.frame.
 
 date  var1var2num1num2
 28/01/2011a   1   21371
 28/01/2011c   2   867289
 29/01/2011a   2   23478
 30/01/2011c   2   28896
 30/01/2011a   3   417139
 
 
 
 If I use the !duplicated function with one variable everything works
 fine.
 However I wish to remove duplicates of both Date and var2.
 
  test[!duplicated(test$date),]
 date var1 var2 num1 num2
 1 0011-01-28a1  213   71
 4 0011-01-29a2  234   78
 7 0011-01-30a3  417  139
 
 test2 - test[!duplicated(test$date),!duplicated(test$var2),]
 Error in `[.data.frame`(test, !duplicated(test$date),
 !duplicated(test$var2),  :   undefined columns selected
 I got different errors when using the unique() function. 
 
 Can anybody solve this ? 
 
 Thanks in advance.
 Jon

The following gives what you state you wish to obtain (though
not quite in the same order of rows. Call the original dataframe 'df':

  df
  #   date var1 var2 num1 num2
  # 1 28/01/11a1  213   71
  # 2 28/01/11b1  141   47
  # 3 28/01/11c2  867  289
  # 4 29/01/11a2  234   78
  # 5 29/01/11b2  666  222
  # 6 29/01/11c2  912  304
  # 7 30/01/11a3  417  139
  # 8 30/01/11b3  108   36
  # 9 30/01/11c2  288   96

  ix -which(duplicated(data.frame(df$date,df$var2)))
  ix
  # [1] 2 5 6 8

  df[-ix,]
  #   date var1 var2 num1 num2
  # 1 28/01/11a1  213   71
  # 3 28/01/11c2  867  289
  # 4 29/01/11a2  234   78
  # 7 30/01/11a3  417  139
  # 9 30/01/11c2  288   96

Does this help?
Ted.
PS I'm posting this from a temporarily subscribed alternative
address (for testing purposes) instead of my usual
ted.hard...@wlandres.net


E-Mail: (Ted Harding) e...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 28-Feb-11   Time: 16:19:59
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.