On 14-02-06 8:31 PM, Carl Witthoft wrote:
First, let me apologize in advance if this is the wrong place to submit
a suggestion for a change to functions in the base-R package.  It never
really occurred to me that I'd have an idea worthy of such a change.

My idea is to provide an upgrade to all the "sets" tools (intersect,
union, setdiff, setequal) that allows the user to apply them in a
strictly algebraic style.

The current tools, as well documented, remove duplicate values in the
input vectors.  This can be helpful in stats work, but is inconsistent
with the mathematical concept of sets and set measure.

I understand what you are asking for, but I think this justification for it is just wrong. Sets don't have duplicated elements: an element is in a set, or it is not. It can't be in the set more than once.



What I propose
is that all these functions be given an additional argument with a
default value:  "multiple=FALSE" .  When called this way, the functions
remain as at present.  When called with "multiple=TRUE,"  they treat the
input vectors as true 'sets' of elements.

I've already written and tested upgrades to all four functions, so if
upgrading the base-R package is not appropriate, I'll post as a package
to CRAN.  It just seems more sensible to add to the base.

Thanks in advance for any advice or comments.
(Please be sure to email, as I can't recall if I'm currently registered
for r-devel)

Here's an example of the new code:

intersect<-function (x, y,multiple=FALSE)
{
      y <- as.vector(y)
        trueint <- y[match(as.vector(x), y, 0L)]
      if(!multiple) trueint <- unique(trueint)
        return(trueint)
}

This is not symmetric. I'd like intersect(x,y,TRUE) to be the same as intersect(y,x,TRUE), up to re-ordering. That's not true of your function:

> x <- c(1,1,2,3)
> y <- c(1,1,1,4)
> intersect(x,y,multiple=TRUE)
[1] 1 1
> intersect(y,x,multiple=TRUE)
[1] 1 1 1

I'd suggest that you clearly define what you mean by your functions, and put them in a package, along with examples where they give more useful results than the standard definitions. I think the current base package functions match the mathematical definitions better.

Duncan Murdoch

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to