Re: RFC 179 (v1) More functions from set theory to manipulate arrays

Michael Maraist Thu, 31 Aug 2000 08:33:40 -0700

> More functions from set theory to manipulate arrays
>
> I'd like to use these functions in this way :
>
>  @c = union (@a, @b);
>  # wich is different from @c = (@a, @b) because union do not duplicate
>  same elements
>
>  @c = intersection (@a, @b);
>  @c = diff(@a, @b);
>

Elements of this are found in many other languages.. Pascal and python have
the built in "in" operator.
Maybe it's because of my engineering background, but I dispise inefficiency,
and these sorts of operations require scans.  I've had to make use of sets
in a lot of my work, and very rarely do I use raw lists.  I'm not saying
that this shouldn't be implemented,  I'm just saying that it urks me.

For basic set's I mainly make use of hashes.  Unfortunately, the syntax can
be somewhat combersome for the complete set-theory suite.  CPAN, on the
other hand, offers several modules for use in set-theory that work very
efficiently.

In persuing this RFC, there are a couple things I can think to consider:

First is the choice of arrays verses hashes as the choice for set storage.
Arrays are obviously easier to construct, but hashes are both faster
implementations, and easier to determine membership.

Next, what subset of the set-theory should be implemented.  Obviously you
refer to the basic and / or / xor, but in real practice, the other operators
can be very useful.  Chaining operators (especially with array-based-sets)
can be a performance nightmare.

Next, these operations can easily be implemented as a module (written out to
C if you like), and simply distributed with the core library.. Since some of
us are working to tighten down on the core of perl and break things out into
modules, this obviously runs counter to our goal.

Even the odd form of passing two arrays to the function could still work
with a module through the use of prototypes ( sub union(\@\@) { ... } ).
Unfortunately, because of this, you would require that only true arrays can
be passed.  I would much rather be able to instantiate a new set on the fly
as in:
@u_set = union( \@set_a, [ 1 ] );

It would be closer to a push, which doesn't require two arrays to be passed.
Perhaps it could be prototyped as ( sub union(\@@) { ... } )   and could
therefore allow additions to the set as in:
@u_set = union( @set_a, [ 1 ] );

As you can see, there are many ways in which this could work, and they each
have different advantages and disadvantages.  From this, I am still of the
opinion that CPAN modules are the way to go (possibly making one standard).

-Michael
Re: RFC 179 (v1) More functions from set theory to manipulate arrays

Reply via email to