All,

I believe that there is some room for adding several new convenience operators or functions to Perl 6 that are used with Mapping and Hash values.

Or getting more to the point, I believe that the need for the relational data model concept of a tuple (a "tuple" where elements are addressed by name not position) would be satisfied by the existing Perl 6 data types of Mapping (immutable variant) and Hash (mutable variant), but that some common relational operations would be a lot easier to express if Perl 6 had a few more operators that make them concise.

Below I will name some of these operators that, AFAIK, don't exist yet in some form; since they are all pure functions, I will use the Mapping type in their pseudo-Perl-6 signatures, but Hash versions should exist too. Or specifically, these should be part of the Mapping role, so anything that .does Mapping, such as a Hash, does them too? Some of these operators are like those for sets, but aren't exactly the same due to plain set ops not working for mappings or hashes as a whole.

I want to emphasize that the operator names are those that are used in DBMS contexts, but you can of course name them something else in order for them to fit better into Perl 6; the importance is having some concise way to get the desired semantics. Also, this functionality doesn't have to be with new operators, but could utilize existing ones if there is a concise way to do so. Likewise, some could conceivably be macros, if it wouldn't impair performance.

I also want to emphasize that I see this functionality being generally useful, and that it shouldn't just be shunted off to a third-party module.

1.  join() aka natural_join():

        function join of Mapping (Mapping $m1, Mapping $m2) { ... }

This binary operator is conceptually like a set-union operator, in that it derives a Mapping that has all of the distinct keys and values of its 2 arguments, assuming any matching keys also have matching values. (Note that "matching" specifically means that === returns true, or if users get a choice, then that is its default meaning.)

But if there are any matching keys with mismatching values, then this is a failure condition (they are incompatible), and the function returns undef instead (or fail, though given the anticipated use case, undef is more appropriate). It is only possible for 2 arguments to be incompatible if they have any keys in common; if they have none, the result is guaranteed to be defined/successful. If the 2 arguments have all keys in common, they must be equal, and the result is also equal to either.

This join() function is both commutative and associative, and can generalize to N arguments. Any equal arguments are redundant and so duplicates can be ignored. Given 2 or more arguments, each is unioned pairwise until 1 remains. Given 1 argument, the result is that argument. Given zero arguments, the result is a Mapping with zero elements. A zero-element Mapping is its identity value.

So join() can be used as a reduction operator, with identity of the empty Mapping, but that it can return undef (or fail) instead if any 2 arguments have the same keys but different associated values.

        For examples:

        join( { a<1>, b<2> }, { b<2>, c<3> } )
                # returns { a<1>, b<2>, c<3> }
        join( { a<1>, b<2> }, { b<4>, c<3> } )
                # returns undef
        join( { a<1>, b<2> }, { c<3>, d<4> } )
                # returns { a<1>, b<2>, c<3>, d<4> }
        join( { a<1>, b<2> }, { a<1> } )
                # returns { a<1>, b<2> }
        join( { a<1> } )
                # returns { a<1> }
        join( { a<1> }, {} )
                # returns { a<1> }
        join()
                # returns {}

In practice, if a relation were implemented, say, as a set of Mapping, then the relational (natural) join could then be implemented sort of like this:

        function join of Relation (Relation $r1, Relation $r2) {
                return Relation( grep <-- $r1.values XjoinX $r2.values );
        }

That is, the relational (natural) join could then simply be implemented as a pairwise invocation of the tuple join between every tuple in each relation, keeping only the results that are defined.

In this wider sense, a relational (natural) join is both an intersection in one dimension and a union in the other dimension.

Now, I'm not currently asking for Relation to be implemented as a Perl 6 feature (it is actually more complicated than "set of mapping"), but if Mapping|Hash had an operator like I mentioned, it would be easier to make one on top of it; moreover, the Mapping|Hash could also implement the "heading" of a relation (a name-to-declared-type map), not just its "body" composed of tuples (each being a name-to-value map).

2.  semijoin() aka matching():

        function semijoin of Mapping (Mapping $m1, Mapping $m2) { ... }

This operator is like join() except that it will simply return $m1 if the arguments are compatible, rather than a new mapping. (This is assuming we're dealing with actual Mapping, which are immutable; depending on usage with a Hash instead, perhaps making a new Hash is desired?) Therefore, in a wider relational semijoin() contect, we are simply filtering $r1 by $r2. Note that a normal join() such that $m2 is a subset of $m1 is functionally a semijoin() anyway. Also, unlike join(), semijoin() is *not* commutative.

3.  semidifference() aka not_matching():

        function semidifference of Mapping (Mapping $m1, Mapping $m2) { ... }

This operator is the complement of semijoin(), in that, given the same 2 Mapping arguments, it would return $m1 when semijoin() would return undef, and return undef when semijoin() would return $m1.

4.  rename():

        function rename of Mapping (Mapping $m, Str $old_k, Str $new_k) { ... }

This operator takes one Mapping argument and derives another that is identical but that one of its existing keys was replaced with a different previously-nonexisting key, and the old key's value moved over to the new one, so just a key changed. It is invalid to rename a key to match another existing one, and should fail (not undef). Renaming a key to itself is a no-op.

This operator can be generalized so that it renames N keys rather than just one, in which case the old_k/new_k args can, eg, be replaced by a Str<Str> hash argument; in the latter case, it is valid to swap 2 key names for each other.

5.  project() aka select():

        function project of Mapping (Mapping $m, @keys_to_keep) { ... }

This operator essentially takes a slice of the Mapping, but that the result is a Mapping too, keeping the values with the projected keys. @keys_to_keep can have zero elements or all of the source Mapping's elements, but specifying a key that isn't in the source is a failure condition.

6.  remove() aka delete() aka project_all_but():

        function remove of Mapping (Mapping $m, @keys_to_remove) { ... }

This operator is the same as project() but that it projects all source elements *except* those specified in @keys_to_remove.

7.  compose():

        function compose of Mapping (Mapping $m1, Mapping $m2) { ... }

This operator is to join() like symmetric_difference() on a set is to union() on a set. It is like a macro that first joins $m1 and $m2, then does a projection on the result so that only keys that were in just one of the source mappings is in the final result, and any keys in common are not.

8.  wrap():

        function wrap of Mapping (Mapping $m, Str @old_k, Str $new_k) { ... }

This operator takes one Mapping argument and derives another that is the same but that the 0..N elements with keys named by @old_k are removed, and then re-inserted as a single Mapping-typed element value whose key is $new_k. This operator fails if @old_k names any non-existant keys, or if $new_k matches an existing key that isn't in @old_k.

9.  unwrap():

        function unwrap of Mapping (Mapping $m, Str $old_k) { ... }

This operator is the inverse of wrap(); $old_k is the name of a Mapping-typed element value, and that element is replaced by the elements from the value. This operator fails if any element keys of the value for $old_k are the same as any other values in $m besides $old_k.

Okay, so that's more or less it.

Its possible that additional operators may be useful, but I haven't thought them through yet. (Also, some relational operators don't make sense just applied to individual tuples, and so they aren't mentioned above.)

Any feedback is appreciated. Including both appropriate names for the semantics of the operators I mentioned, and/or comparably very concise syntax for doing the same with existing Perl 6 operators.

Thank you. -- Darren Duncan

Reply via email to