[Rd] duplicated() variation that goes both ways to capture all duplicates

2012-07-23 Thread Liviu Andronic
Dear all
The trouble with the current duplicated() function in is that it can
report duplicates while searching fromFirst _or_ fromLast, but not
both ways. Often users will want to identify and extract all the
copies of the item that has duplicates, not only the duplicates
themselves.

To take the example from the man page:
 data(iris)
 iris[duplicated(iris), ]  ##duplicates while searching fromFirst
Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
143  5.8 2.7  5.1 1.9 virginica
 iris[duplicated(iris, fromLast=T), ]  ##duplicates while searching fromLast
Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
102  5.8 2.7  5.1 1.9 virginica


To extract all the copies of the concerned items (original and
duplicates) one would need to do something like this:
 iris[(duplicated(iris) | duplicated(iris, fromLast=T)), ]  ##duplicates while 
 searching bothWays
Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
102  5.8 2.7  5.1 1.9 virginica
143  5.8 2.7  5.1 1.9 virginica


Unfortunately this is unnecessarily long and convoluted. Short of a
'bothWays' argument in duplicated(), I came up with a small wrapper
that simplifies the above:
duplicated2 -
function(x, bothWays=TRUE, ...)
{
if(!bothWays) {
return(duplicated(x, ...))
} else if(bothWays) {
return((duplicated(x, ...) | duplicated(x, fromLast=TRUE, ...)))
}
}


Now the above can be achieved simply via:
 iris[duplicated2(iris), ]  ##duplicates while searching bothWays
Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
102  5.8 2.7  5.1 1.9 virginica
143  5.8 2.7  5.1 1.9 virginica


So here's my inquiry: Would the R Core consider adding such
functionality in 'base' R? Either the---suitably cleaned
up---duplicated2() function above, or a bothWays argument in
duplicated() itself? Either of the two would improve user convenience
and reduce confusion. (In my case it took some time before I
understood the correct approach to this problem.)

Regards
Liviu


-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] duplicated() variation that goes both ways to capture all duplicates

2012-07-23 Thread Duncan Murdoch

On 23/07/2012 8:49 AM, Liviu Andronic wrote:

Dear all
The trouble with the current duplicated() function in is that it can
report duplicates while searching fromFirst _or_ fromLast, but not
both ways. Often users will want to identify and extract all the
copies of the item that has duplicates, not only the duplicates
themselves.

To take the example from the man page:
 data(iris)
 iris[duplicated(iris), ]  ##duplicates while searching fromFirst
 Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
143  5.8 2.7  5.1 1.9 virginica
 iris[duplicated(iris, fromLast=T), ]  ##duplicates while searching fromLast
 Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
102  5.8 2.7  5.1 1.9 virginica


To extract all the copies of the concerned items (original and
duplicates) one would need to do something like this:
 iris[(duplicated(iris) | duplicated(iris, fromLast=T)), ]  ##duplicates while searching 
bothWays
 Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
102  5.8 2.7  5.1 1.9 virginica
143  5.8 2.7  5.1 1.9 virginica


Unfortunately this is unnecessarily long and convoluted. Short of a
'bothWays' argument in duplicated(), I came up with a small wrapper
that simplifies the above:
duplicated2 -
 function(x, bothWays=TRUE, ...)
 {
 if(!bothWays) {
 return(duplicated(x, ...))
 } else if(bothWays) {
 return((duplicated(x, ...) | duplicated(x, fromLast=TRUE, 
...)))
 }
 }


Now the above can be achieved simply via:
 iris[duplicated2(iris), ]  ##duplicates while searching bothWays
 Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
102  5.8 2.7  5.1 1.9 virginica
143  5.8 2.7  5.1 1.9 virginica


So here's my inquiry: Would the R Core consider adding such
functionality in 'base' R? Either the---suitably cleaned
up---duplicated2() function above, or a bothWays argument in
duplicated() itself? Either of the two would improve user convenience
and reduce confusion. (In my case it took some time before I
understood the correct approach to this problem.)


I can't speak for all of R core, but I don't see the need for this in 
base R -- your solution looks fine to me.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel