On Fri, Apr 8, 2011 at 9:59 AM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > I need a function which is similar to duplicated(), but instead of returning > TRUE/FALSE, returns indices of which element was duplicated. That is, > >> x <- c(9,7,9,3,7) >> duplicated(x) > [1] FALSE FALSE TRUE FALSE TRUE > >> duplicates(x) > [1] NA NA 1 NA 2 > > (so that I know that element 3 is a duplicate of element 1, and element 5 is > a duplicate of element 2, whereas the others were not duplicated according > to our definition.) > > Is there a simple way to write this function? I have an ugly > implementation in R that loops over all the values; it would make more sense > to redo it in C, if there isn't a simple implementation I missed.
I'd think of making it a lookup table. The basic idea is split(seq_along(x), x) but there are probably much faster ways of doing it, depending on what you need. But for efficiency, you probably need a hashtable somewhere. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel