Re: request new Mapping|Hash operators

John Beppu Fri, 23 Mar 2007 18:15:38 -0800

You might find Dee interesting:

http://www.quicksort.co.uk/


A relational language extension for Python

Inspired by 'The Third Manifesto', a book by Chris Date and Hugh Darwen,
we're putting forward an implementation of a truly relational language using
Python (Dee). We address two problems:

1. The impedance mismatch between programming languages and databases
2. The weakness and syntactic awkwardness of SQL


On 2/27/07, Darren Duncan <[EMAIL PROTECTED]> wrote:


All,

I believe that there is some room for adding several new convenience
operators or functions to Perl 6 that are used with Mapping and Hash
values.

Or getting more to the point, I believe that the need for the
relational data model concept of a tuple (a "tuple" where elements
are addressed by name not position) would be satisfied by the
existing Perl 6 data types of Mapping (immutable variant) and Hash
(mutable variant), but that some common relational operations would
be a lot easier to express if Perl 6 had a few more operators that
make them concise.

Below I will name some of these operators that, AFAIK, don't exist
yet in some form; since they are all pure functions, I will use the
Mapping type in their pseudo-Perl-6 signatures, but Hash versions
should exist too.  Or specifically, these should be part of the
Mapping role, so anything that .does Mapping, such as a Hash, does
them too?  Some of these operators are like those for sets, but
aren't exactly the same due to plain set ops not working for mappings
or hashes as a whole.

I want to emphasize that the operator names are those that are used
in DBMS contexts, but you can of course name them something else in
order for them to fit better into Perl 6; the importance is having
some concise way to get the desired semantics.  Also, this
functionality doesn't have to be with new operators, but could
utilize existing ones if there is a concise way to do so.  Likewise,
some could conceivably be macros, if it wouldn't impair performance.

I also want to emphasize that I see this functionality being
generally useful, and that it shouldn't just be shunted off to a
third-party module.

1.  join() aka natural_join():

        function join of Mapping (Mapping $m1, Mapping $m2) { ... }

        This binary operator is conceptually like a set-union
operator, in that it derives a Mapping that has all of the distinct
keys and values of its 2 arguments, assuming any matching keys also
have matching values.  (Note that "matching" specifically means that
=== returns true, or if users get a choice, then that is its default
meaning.)

        But if there are any matching keys with mismatching values,
then this is a failure condition (they are incompatible), and the
function returns undef instead (or fail, though given the anticipated
use case, undef is more appropriate).  It is only possible for 2
arguments to be incompatible if they have any keys in common; if they
have none, the result is guaranteed to be defined/successful.  If the
2 arguments have all keys in common, they must be equal, and the
result is also equal to either.

        This join() function is both commutative and associative, and
can generalize to N arguments.  Any equal arguments are redundant and
so duplicates can be ignored.  Given 2 or more arguments, each is
unioned pairwise until 1 remains.  Given 1 argument, the result is
that argument.  Given zero arguments, the result is a Mapping with
zero elements.  A zero-element Mapping is its identity value.

        So join() can be used as a reduction operator, with identity
of the empty Mapping, but that it can return undef (or fail) instead
if any 2 arguments have the same keys but different associated values.

        For examples:

        join( { a<1>, b<2> }, { b<2>, c<3> } )
                # returns { a<1>, b<2>, c<3> }
        join( { a<1>, b<2> }, { b<4>, c<3> } )
                # returns undef
        join( { a<1>, b<2> }, { c<3>, d<4> } )
                # returns { a<1>, b<2>, c<3>, d<4> }
        join( { a<1>, b<2> }, { a<1> } )
                # returns { a<1>, b<2> }
        join( { a<1> } )
                # returns { a<1> }
        join( { a<1> }, {} )
                # returns { a<1> }
        join()
                # returns {}

        In practice, if a relation were implemented, say, as a set of
Mapping, then the relational (natural) join could then be implemented
sort of like this:

        function join of Relation (Relation $r1, Relation $r2) {
                return Relation( grep <-- $r1.values XjoinX $r2.values );
        }

        That is, the relational (natural) join could then simply be
implemented as a pairwise invocation of the tuple join between every
tuple in each relation, keeping only the results that are defined.

        In this wider sense, a relational (natural) join is both an
intersection in one dimension and a union in the other dimension.

        Now, I'm not currently asking for Relation to be implemented
as a Perl 6 feature (it is actually more complicated than "set of
mapping"), but if Mapping|Hash had an operator like I mentioned, it
would be easier to make one on top of it; moreover, the Mapping|Hash
could also implement the "heading" of a relation (a
name-to-declared-type map), not just its "body" composed of tuples
(each being a name-to-value map).

2.  semijoin() aka matching():

        function semijoin of Mapping (Mapping $m1, Mapping $m2) { ... }

        This operator is like join() except that it will simply
return $m1 if the arguments are compatible, rather than a new
mapping.  (This is assuming we're dealing with actual Mapping, which
are immutable; depending on usage with a Hash instead, perhaps making
a new Hash is desired?)  Therefore, in a wider relational semijoin()
contect, we are simply filtering $r1 by $r2.  Note that a normal
join() such that $m2 is a subset of $m1 is functionally a semijoin()
anyway.  Also, unlike join(), semijoin() is *not* commutative.

3.  semidifference() aka not_matching():

        function semidifference of Mapping (Mapping $m1, Mapping $m2) {
... }

        This operator is the complement of semijoin(), in that, given
the same 2 Mapping arguments, it would return $m1 when semijoin()
would return undef, and return undef when semijoin() would return $m1.

4.  rename():

        function rename of Mapping (Mapping $m, Str $old_k, Str $new_k) {
... }

        This operator takes one Mapping argument and derives another
that is identical but that one of its existing keys was replaced with
a different previously-nonexisting key, and the old key's value moved
over to the new one, so just a key changed.  It is invalid to rename
a key to match another existing one, and should fail (not undef).
Renaming a key to itself is a no-op.

        This operator can be generalized so that it renames N keys
rather than just one, in which case the old_k/new_k args can, eg, be
replaced by a Str<Str> hash argument; in the latter case, it is valid
to swap 2 key names for each other.

5.  project() aka select():

        function project of Mapping (Mapping $m, @keys_to_keep) { ... }

        This operator essentially takes a slice of the Mapping, but
that the result is a Mapping too, keeping the values with the
projected keys.  @keys_to_keep can have zero elements or all of the
source Mapping's elements, but specifying a key that isn't in the
source is a failure condition.

6.  remove() aka delete() aka project_all_but():

        function remove of Mapping (Mapping $m, @keys_to_remove) { ... }

        This operator is the same as project() but that it projects
all source elements *except* those specified in @keys_to_remove.

7.  compose():

        function compose of Mapping (Mapping $m1, Mapping $m2) { ... }

        This operator is to join() like symmetric_difference() on a
set is to union() on a set.  It is like a macro that first joins $m1
and $m2, then does a projection on the result so that only keys that
were in just one of the source mappings is in the final result, and
any keys in common are not.

8.  wrap():

        function wrap of Mapping (Mapping $m, Str @old_k, Str $new_k) {
... }

        This operator takes one Mapping argument and derives another
that is the same but that the 0..N elements with keys named by @old_k
are removed, and then re-inserted as a single Mapping-typed element
value whose key is $new_k.  This operator fails if @old_k names any
non-existant keys, or if $new_k matches an existing key that isn't in
@old_k.

9.  unwrap():

        function unwrap of Mapping (Mapping $m, Str $old_k) { ... }

        This operator is the inverse of wrap(); $old_k is the name of
a Mapping-typed element value, and that element is replaced by the
elements from the value.  This operator fails if any element keys of
the value for $old_k are the same as any other values in $m besides
$old_k.

Okay, so that's more or less it.

Its possible that additional operators may be useful, but I haven't
thought them through yet.  (Also, some relational operators don't
make sense just applied to individual tuples, and so they aren't
mentioned above.)

Any feedback is appreciated.  Including both appropriate names for
the semantics of the operators I mentioned, and/or comparably very
concise syntax for doing the same with existing Perl 6 operators.

Thank you. -- Darren Duncan

Re: request new Mapping|Hash operators

Reply via email to