[R] finding both rows that are duplicated in a data frame

Robert Lynch Sat, 07 Sep 2013 00:07:38 -0700

I have a data frame that looks like

id1<-c(1,1,2,2,3,3,4,5,5,6,6,7,8,9,9,10)
id2<-c(22,22,34,34,15,15,76,45,45,84,84,37,52,66,66,91)
GENDER<-sample(c("G-UNK","G-M","G-F"),16, replace = TRUE)
ETH <-sample(c("E-AF","E-UNK","E-VT"),16, replace = TRUE)
example<-cbind(id1,id2,GENDER,ETH)


where there are two id's and some duplicate entries for ID's that have
different GENDER or ETH(nicity)
I would like to get a data frame that doesn't have the duplicates, but the
ones that are kept are which ever GENDER is not G-UNK (unknown) and the
kept ETH is what ever is not E-UNK

the resultant data frame should have 10 rows with no *-UNK in either of the
last two columns ( unless both entries were UNK)

yes the example data may have some impossible results but it does capture
important aspects.
1) G-UNK is alphabetically last of G-F, G-M & G-UNK
2) E-UNK is in the middle alphabetically
3) some times the first entry is the unknown gender, some times it is the
second *likely to happen with random sample
4) some times both entries for one variable, GENDER or ETH are unknown.
5) only appears to be two of each row, * not 100% sure

Thanks!
 Robert

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] finding both rows that are duplicated in a data frame

Reply via email to