Re: [R] which rows are duplicates?

Michael Dewey Mon, 30 Mar 2009 03:54:15 -0700

At 05:07 30/03/2009, Aaron M. Swoboda wrote:

I would like to know which rows are duplicates of each other, not
simply that a row is duplicate of another row. In the following
example rows 1 and 3 are duplicates.


> x <- c(1,3,1)
> y <- c(2,4,2)
> z <- c(3,4,3)
> data <- data.frame(x,y,z)
    x y z
1 1 2 3
2 3 4 4
3 1 2 3


Does this do what you want?
> x <- c(1,3,1)
> y <- c(2,4,2)
> z <- c(3,4,3)
> data <- data.frame(x,y,z)
> data.u <- unique(data)
> data.u
  x y z
1 1 2 3
2 3 4 4
> data.u <- cbind(data.u, set = 1:nrow(data.u))
> merge(data, data.u)
  x y z set
1 1 2 3   1
2 1 2 3   1
3 3 4 4   2

You need to do a bit more work to get them back into the original roworder if that is essential.

I can't figure out how to get R to tell me that observation 1 and 3
are the same.  It seems like the "duplicated" and "unique" functions
should be able to help me out, but I am stumped.

For instance, if I use "duplicated" ...

> duplicated(data)
[1] FALSE FALSE TRUE

it tells me that row 3 is a duplicate, but not which row it matches.
How do I figure out WHICH row it matches?

And If I use "unique"...

> unique(data)
    x y z
1 1 2 3
2 3 4 4

I see that rows 1 and 2 are unique, leaving me to infer that row 3 was
a duplicate, but again it doesn't tell me which row it was a duplicate
of (as far as I can tell). Am I missing something?

How can I determine that row 3 is a duplicate OF ROW 1?

Thanks,

Aaron


Michael Dewey
http://www.aghmed.fsnet.co.uk

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] which rows are duplicates?

Reply via email to