Re: [Rd] suggestion for sets tools upgrade

2014-02-08 Thread Carl Witthoft

Thanks to Duncan and all who responded.

I agree that the algebraic set rules do not allow for indistinguishable 
elements;  I must have been deeply immersed in quantum fermions when I 
wrote strictly rather than less in front of algebraic style.


I'll clean up my code (so that intersect() remains symmetric, among 
other things) , and submit as a separate package to CRAN.


Carl


On 2/7/14 7:37 AM, Duncan Murdoch wrote:

On 14-02-06 8:31 PM, Carl Witthoft wrote:



My idea is to provide an upgrade to all the sets tools (intersect,
union, setdiff, setequal) that allows the user to apply them in a
strictly algebraic style.

The current tools, as well documented, remove duplicate values in the
input vectors.  This can be helpful in stats work, but is inconsistent
with the mathematical concept of sets and set measure.


I understand what you are asking for, but I think this justification for
it is just wrong.  Sets don't have duplicated elements:  an element is
in a set, or it is not.  It can't be in the set more than once.



What I propose

is that all these functions be given an additional argument with a
default value:  multiple=FALSE .  When called this way, the functions
remain as at present.  When called with multiple=TRUE,  they treat the
input vectors as true 'sets' of elements.



Here's an example of the new code:

intersect-function (x, y,multiple=FALSE)
{
  y - as.vector(y)
trueint - y[match(as.vector(x), y, 0L)]
  if(!multiple) trueint - unique(trueint)
return(trueint)
}


This is not symmetric.  I'd like intersect(x,y,TRUE) to be the same as
intersect(y,x,TRUE), up to re-ordering.  That's not true of your function:

  x - c(1,1,2,3)
  y - c(1,1,1,4)
  intersect(x,y,multiple=TRUE)
[1] 1 1
  intersect(y,x,multiple=TRUE)
[1] 1 1 1

I'd suggest that you clearly define what you mean by your functions, and
put them in a package, along with examples where they give more useful
results than the standard definitions.  I think the current base package
functions match the mathematical definitions better.

Duncan Murdoch




--

Sent from a parallel universe almost, but not entirely,
nothing at all like this one.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion for sets tools upgrade

2014-02-07 Thread Kevin Coombes
As a mathematician by training (and a former practicing mathematician, 
both of which qualifications I rarely feel compelled to pull out of the 
closet), I have to agree with Michael's challenge to the original 
assertion about the mathematical concept of sets.


Sets are collections of distinct objects (at least in Cantors' original 
naive definition) and do not have a notion of duplicate values.  In 
the modern axiomatic definition, one axiom is that two sets are equal 
if and only if they contain the same members. To expand on Michael's 
example, the union of {1, 2} with {1, 3} is {1, 2, 3}, not {1, 2, 1, 3} 
since there is only one distinct object designated by the value 1.


A computer programming language could choose to use the ordered vector 
(or list) [1, 2, 1, 3] as an internal representation of the union of 
[1,2], and [1,3], but it would then have to work hard to perform every 
other meaningful set operation.  For instance, the cardinality of the 
union still has to equal three (not four, which is the length of the 
list), since there are exactly three distinct objects that are members. 
And, as Michael points out, the set represented by [1,2,3] has to be 
equal to the set represented by [1,2,1,3] since they contain exactly the 
same members.


  Kevin

On 2/6/2014 9:39 PM, R. Michael Weylandt wrote:

On Thu, Feb 6, 2014 at 8:31 PM, Carl Witthoft c...@witthoft.com wrote:

First, let me apologize in advance if this is the wrong place to submit a
suggestion for a change to functions in the base-R package.  It never really
occurred to me that I'd have an idea worthy of such a change.

My idea is to provide an upgrade to all the sets tools (intersect, union,
setdiff, setequal) that allows the user to apply them in a strictly
algebraic style.

The current tools, as well documented, remove duplicate values in the input
vectors.  This can be helpful in stats work, but is inconsistent with the
mathematical concept of sets and set measure.

No comments about back-compatability concerns, etc. but why do you
think this is closer to the mathematical concept of sets? As I
learned them, sets have no repeats (or order) and other languages with
set primitives tend to agree:

python {1,1,2,3} == {1,2,3}
True

I believe C++ calls what you're looking for a multiset (albeit with a
guarantee or orderedness).

Cheers,
Michael

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion for sets tools upgrade

2014-02-07 Thread Duncan Murdoch

On 14-02-06 8:31 PM, Carl Witthoft wrote:

First, let me apologize in advance if this is the wrong place to submit
a suggestion for a change to functions in the base-R package.  It never
really occurred to me that I'd have an idea worthy of such a change.

My idea is to provide an upgrade to all the sets tools (intersect,
union, setdiff, setequal) that allows the user to apply them in a
strictly algebraic style.

The current tools, as well documented, remove duplicate values in the
input vectors.  This can be helpful in stats work, but is inconsistent
with the mathematical concept of sets and set measure.


I understand what you are asking for, but I think this justification for 
it is just wrong.  Sets don't have duplicated elements:  an element is 
in a set, or it is not.  It can't be in the set more than once.




What I propose

is that all these functions be given an additional argument with a
default value:  multiple=FALSE .  When called this way, the functions
remain as at present.  When called with multiple=TRUE,  they treat the
input vectors as true 'sets' of elements.

I've already written and tested upgrades to all four functions, so if
upgrading the base-R package is not appropriate, I'll post as a package
to CRAN.  It just seems more sensible to add to the base.

Thanks in advance for any advice or comments.
(Please be sure to email, as I can't recall if I'm currently registered
for r-devel)

Here's an example of the new code:

intersect-function (x, y,multiple=FALSE)
{
  y - as.vector(y)
trueint - y[match(as.vector(x), y, 0L)]
  if(!multiple) trueint - unique(trueint)
return(trueint)
}


This is not symmetric.  I'd like intersect(x,y,TRUE) to be the same as 
intersect(y,x,TRUE), up to re-ordering.  That's not true of your function:


 x - c(1,1,2,3)
 y - c(1,1,1,4)
 intersect(x,y,multiple=TRUE)
[1] 1 1
 intersect(y,x,multiple=TRUE)
[1] 1 1 1

I'd suggest that you clearly define what you mean by your functions, and 
put them in a package, along with examples where they give more useful 
results than the standard definitions.  I think the current base package 
functions match the mathematical definitions better.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] suggestion for sets tools upgrade

2014-02-06 Thread Carl Witthoft
First, let me apologize in advance if this is the wrong place to submit 
a suggestion for a change to functions in the base-R package.  It never 
really occurred to me that I'd have an idea worthy of such a change.


My idea is to provide an upgrade to all the sets tools (intersect, 
union, setdiff, setequal) that allows the user to apply them in a 
strictly algebraic style.


The current tools, as well documented, remove duplicate values in the 
input vectors.  This can be helpful in stats work, but is inconsistent 
with the mathematical concept of sets and set measure.  What I propose 
is that all these functions be given an additional argument with a 
default value:  multiple=FALSE .  When called this way, the functions 
remain as at present.  When called with multiple=TRUE,  they treat the 
input vectors as true 'sets' of elements.


I've already written and tested upgrades to all four functions, so if 
upgrading the base-R package is not appropriate, I'll post as a package 
to CRAN.  It just seems more sensible to add to the base.


Thanks in advance for any advice or comments.
(Please be sure to email, as I can't recall if I'm currently registered 
for r-devel)


Here's an example of the new code:

intersect-function (x, y,multiple=FALSE)
{
y - as.vector(y)
trueint - y[match(as.vector(x), y, 0L)]
if(!multiple) trueint - unique(trueint)
return(trueint)
}

thanks
Carl
-

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel