request new Mapping|Hash operators

Darren Duncan Tue, 27 Feb 2007 00:18:55 -0800

All,

I believe that there is some room for adding several new convenienceoperators or functions to Perl 6 that are used with Mapping and Hashvalues.

Or getting more to the point, I believe that the need for therelational data model concept of a tuple (a "tuple" where elementsare addressed by name not position) would be satisfied by theexisting Perl 6 data types of Mapping (immutable variant) and Hash(mutable variant), but that some common relational operations wouldbe a lot easier to express if Perl 6 had a few more operators thatmake them concise.

Below I will name some of these operators that, AFAIK, don't existyet in some form; since they are all pure functions, I will use theMapping type in their pseudo-Perl-6 signatures, but Hash versionsshould exist too. Or specifically, these should be part of theMapping role, so anything that .does Mapping, such as a Hash, doesthem too? Some of these operators are like those for sets, butaren't exactly the same due to plain set ops not working for mappingsor hashes as a whole.

I want to emphasize that the operator names are those that are usedin DBMS contexts, but you can of course name them something else inorder for them to fit better into Perl 6; the importance is havingsome concise way to get the desired semantics. Also, thisfunctionality doesn't have to be with new operators, but couldutilize existing ones if there is a concise way to do so. Likewise,some could conceivably be macros, if it wouldn't impair performance.

I also want to emphasize that I see this functionality beinggenerally useful, and that it shouldn't just be shunted off to athird-party module.


1.  join() aka natural_join():

        function join of Mapping (Mapping $m1, Mapping $m2) { ... }

This binary operator is conceptually like a set-unionoperator, in that it derives a Mapping that has all of the distinctkeys and values of its 2 arguments, assuming any matching keys alsohave matching values. (Note that "matching" specifically means that=== returns true, or if users get a choice, then that is its defaultmeaning.)

But if there are any matching keys with mismatching values,then this is a failure condition (they are incompatible), and thefunction returns undef instead (or fail, though given the anticipateduse case, undef is more appropriate). It is only possible for 2arguments to be incompatible if they have any keys in common; if theyhave none, the result is guaranteed to be defined/successful. If the2 arguments have all keys in common, they must be equal, and theresult is also equal to either.

This join() function is both commutative and associative, andcan generalize to N arguments. Any equal arguments are redundant andso duplicates can be ignored. Given 2 or more arguments, each isunioned pairwise until 1 remains. Given 1 argument, the result isthat argument. Given zero arguments, the result is a Mapping withzero elements. A zero-element Mapping is its identity value.

So join() can be used as a reduction operator, with identityof the empty Mapping, but that it can return undef (or fail) insteadif any 2 arguments have the same keys but different associated values.


        For examples:

        join( { a<1>, b<2> }, { b<2>, c<3> } )
                # returns { a<1>, b<2>, c<3> }
        join( { a<1>, b<2> }, { b<4>, c<3> } )
                # returns undef
        join( { a<1>, b<2> }, { c<3>, d<4> } )
                # returns { a<1>, b<2>, c<3>, d<4> }
        join( { a<1>, b<2> }, { a<1> } )
                # returns { a<1>, b<2> }
        join( { a<1> } )
                # returns { a<1> }
        join( { a<1> }, {} )
                # returns { a<1> }
        join()
                # returns {}

In practice, if a relation were implemented, say, as a set ofMapping, then the relational (natural) join could then be implementedsort of like this:


        function join of Relation (Relation $r1, Relation $r2) {
                return Relation( grep <-- $r1.values XjoinX $r2.values );
        }

That is, the relational (natural) join could then simply beimplemented as a pairwise invocation of the tuple join between everytuple in each relation, keeping only the results that are defined.

In this wider sense, a relational (natural) join is both anintersection in one dimension and a union in the other dimension.

Now, I'm not currently asking for Relation to be implementedas a Perl 6 feature (it is actually more complicated than "set ofmapping"), but if Mapping|Hash had an operator like I mentioned, itwould be easier to make one on top of it; moreover, the Mapping|Hashcould also implement the "heading" of a relation (aname-to-declared-type map), not just its "body" composed of tuples(each being a name-to-value map).


2.  semijoin() aka matching():

        function semijoin of Mapping (Mapping $m1, Mapping $m2) { ... }

This operator is like join() except that it will simplyreturn $m1 if the arguments are compatible, rather than a newmapping. (This is assuming we're dealing with actual Mapping, whichare immutable; depending on usage with a Hash instead, perhaps makinga new Hash is desired?) Therefore, in a wider relational semijoin()contect, we are simply filtering $r1 by $r2. Note that a normaljoin() such that $m2 is a subset of $m1 is functionally a semijoin()anyway. Also, unlike join(), semijoin() is *not* commutative.


3.  semidifference() aka not_matching():

        function semidifference of Mapping (Mapping $m1, Mapping $m2) { ... }

This operator is the complement of semijoin(), in that, giventhe same 2 Mapping arguments, it would return $m1 when semijoin()would return undef, and return undef when semijoin() would return $m1.


4.  rename():

        function rename of Mapping (Mapping $m, Str $old_k, Str $new_k) { ... }

This operator takes one Mapping argument and derives anotherthat is identical but that one of its existing keys was replaced witha different previously-nonexisting key, and the old key's value movedover to the new one, so just a key changed. It is invalid to renamea key to match another existing one, and should fail (not undef).Renaming a key to itself is a no-op.

This operator can be generalized so that it renames N keysrather than just one, in which case the old_k/new_k args can, eg, bereplaced by a Str<Str> hash argument; in the latter case, it is validto swap 2 key names for each other.


5.  project() aka select():

        function project of Mapping (Mapping $m, @keys_to_keep) { ... }

This operator essentially takes a slice of the Mapping, butthat the result is a Mapping too, keeping the values with theprojected keys. @keys_to_keep can have zero elements or all of thesource Mapping's elements, but specifying a key that isn't in thesource is a failure condition.


6.  remove() aka delete() aka project_all_but():

        function remove of Mapping (Mapping $m, @keys_to_remove) { ... }

This operator is the same as project() but that it projectsall source elements *except* those specified in @keys_to_remove.


7.  compose():

        function compose of Mapping (Mapping $m1, Mapping $m2) { ... }

This operator is to join() like symmetric_difference() on aset is to union() on a set. It is like a macro that first joins $m1and $m2, then does a projection on the result so that only keys thatwere in just one of the source mappings is in the final result, andany keys in common are not.


8.  wrap():

        function wrap of Mapping (Mapping $m, Str @old_k, Str $new_k) { ... }

This operator takes one Mapping argument and derives anotherthat is the same but that the 0..N elements with keys named by @old_kare removed, and then re-inserted as a single Mapping-typed elementvalue whose key is $new_k. This operator fails if @old_k names anynon-existant keys, or if $new_k matches an existing key that isn't in@old_k.


9.  unwrap():

        function unwrap of Mapping (Mapping $m, Str $old_k) { ... }

This operator is the inverse of wrap(); $old_k is the name ofa Mapping-typed element value, and that element is replaced by theelements from the value. This operator fails if any element keys ofthe value for $old_k are the same as any other values in $m besides$old_k.


Okay, so that's more or less it.

Its possible that additional operators may be useful, but I haven'tthought them through yet. (Also, some relational operators don'tmake sense just applied to individual tuples, and so they aren'tmentioned above.)

Any feedback is appreciated. Including both appropriate names forthe semantics of the operators I mentioned, and/or comparably veryconcise syntax for doing the same with existing Perl 6 operators.


Thank you. -- Darren Duncan

request new Mapping|Hash operators

Reply via email to