Perl 6 and Set Theory

Luke Palmer Fri, 06 Dec 2002 13:04:02 -0800

=head1 Perl 6 and Set Theory

This document will introduce a new way of thinking about some Perl 6
constructs.  In addition, it proposes some minor changes that would
help this way of thinking be more consistent.  These changes may make
Perl 6 a better language in general, as a side effect.


Even in absence of an explicit "set" type, Perl 6 is quite proficient
in set theory.  There are two types of sets it knows about: finite and
infinite.  The former we call "junctions," while "classes" make up
infinite sets.

=head2 Junctions

Junctions are a transparent mechanism for dealing with finite sets.
They indeed hold a discrete set of objects, but operations on the
junction are automatically delegated to operations on its members.
The two types of junctions, disjunctive (C<any>) and conjunctive
(C<all>) just determine how the junction is evaluated in boolean
context.

The expressions:

    1 | 2 | 3
    any(1, 2, 3)
    1 & 2 & 3
    all(1, 2, 3)

represent the set:

    { 1, 2, 3 }

Which we will call N. Performing some operation I<z> on them
constructs and returns the set:

    { z(x): x ∈ N }

If, for example, this operation is C<{ $^x + 2 }>, we have:

    { x + 2: x ∈ N }

or

    { 3, 4, 5 }

If that was a comparison operation, say C<{ $^x > 2 }>, the resultant
set would be:

    { undef, undef, 1 }

Evaluating this in boolean context should be true if the junction was
conjunctive, and false if it was disjunctive.  So, for a junction J,
boolean context evaluation is defined as follows:

          { ∃x: x ∈ J ∧ ?x, disjunctive
    ?J ≡  {
          { ∀x: x ∈ J ⇒ ?x, conjunctive

That is, the "type" of junction just determines whether the quantifier
is existential or universal.  

There is one behavior of junctions, however, that doesn't work with
this definition:  If a boolean operation is applied in non-boolean
context to a junction, it will return a junction of only the values
for which the condition evaluated true.  But with a few minor changes,
this can be achieved:

    • Junctions discard C<undef>s on-sight (i.e. C<undef> can never be
      a member of a junction).
    • Comparison operators return C<undef> if they fail, and their
      left-hand side "C<but true>" if they succeed.
    • Comparison operators are right-associative.

Unfortunately, the last change clobbers the left-short-circuiting
behavior of comparison operators.  A way to fix this would be to treat
a chain of comparison operators as a single n-ary operator.  Other
methods for fixing this are welcome (and encouraged).

Note that the definition of how junctions behave in itself allows
operators involving more than one junction to represent the outer
product of those junctions with respect to an operator.  Consider two
sets, A = any(1,2,3), and B = all(4,5,6).

    A + B = { x + B: x ∈ A }
          = { { x + y: y ∈ B }: x ∈ A }
          = { { 5, 6, 7 }, { 6, 7, 8 }, { 7, 8, 9 } }

Where each inner set is a conjunctive set, and the outer is a
disjunctive set.  If you call this resultant set C, and ask:

    5 < C < 9

It would be true (in fact, "C<5 but true>"), as all(6, 7, 8) would
match.

=head2 Classes

In Perl 6, classes represent infinite sets of objects.  The class
C<Int> represents the set of integers, C<Num> the set of all numbers,
including the integers.  Inheritance represents supersets.  Finite
sets, or junctions, can be used for union and intersection of these
infinite sets.

Unlike junctions, classes are not transparent interfaces to their
members.  You can't say C<< Int > 10 >> to find out whether all
integers are greater than 10.  There is, in fact, no way to assert
this in Perl, as Perl would have to instantiate ℵ₀ objects, which
would take ℵ₀ seconds (forever).  Similarly with Num, except now
there's ℵ₁ objects and seconds (foreverer).

For instance, to declare a variable that can represent either an
integer or a string:

    my (Int | Str) $var;

The type of that variable is the union of the sets of integers and
strings.  The value of $var must be a member of at least one of those
sets.  

Since a derived object can be used in place of its parent anywhere, it
must represent an improper superset of its parent.  This seems
backwards, but if you think about it logically:  Every member of the
parent set needs to be a member of the child set; every member of the
child set need not be a member of the parent set.  Q.E.D.

Multiple inheritance represents a superset of multiple sets.  

C<Object> or C<UNIVERSAL> represents the "universe"; i.e. the set of
all objects.  The null junction C<any()> represents the empty set
(which is different from C<undef>, as proposed above).

It is an interesting issue whether any junction can be used as a
class.  That is:

    my (1 | 2 | 4) $var;

would declare $var to only accept values 1, 2, and 4.  This probably
introduces too much complexity for what it's worth.

=head2 Summary

The rules for a junction J, again, are:

    z(J) ≡ { z(x): x ∈ J }

           { ∃x: x ∈ J ∧ ?x, disjunctive
    ?J   ≡ {
           { ∀x: x ∈ J ⇒ ?x, conjunctive

Classes have no such rules, as they are not transparent.

=cut

There could be good, mathematically consistent ways to do the things I
proposed better.  Specifically regarding C<undef> and the comparison
operators.  As far as C<undef>, I just think we need to look at it
differently. 

I'd also like junctions and classes to be as powerful as real sets,
which means there needs to be an empty set.  Is any() a good name for
this, or should we use a more atomic name (like, god forbid, ∅).

Luke

Perl 6 and Set Theory

Reply via email to