The Sort Problem: a definitive ruling

Damian Conway Thu, 19 Feb 2004 17:56:21 -0800

The design team discussed "The Sort Problem" during yesterday's teleconference. Here is Larry's decision: final, definitive, and unalterable (well...for this week at least ;-)

-----cut---------cut---------cut---------cut---------cut---------cut----

C<sort> in Perl6 is a global multisub:

    multi sub *sort(Criterion @by: [EMAIL PROTECTED]) {...}
    multi sub *sort(Criterion $by: [EMAIL PROTECTED]) {...}
    multi sub *sort(             : [EMAIL PROTECTED]) {...}

where:

type KeyExtractor ::= Code(Any) returns Any;

type Comparator ::= Code(Any, Any) returns Int;

    type Criterion    ::= KeyExtractor
                        | Comparator
                        | Pair(KeyExtractor, Comparator)
                        ;

That means that we can call C<sort> without a block (to sort stringifically ascending with C<cmp>):

    # Stringifically ascending...
    @sorted = sort @unsorted;

or with a single two-argument block/closure (to sort by whatever the specified comparator is):

    # Numerically ascending...
    @sorted = sort {$^a <=> $^b} @unsorted;

    # Namewise stringifically descending case-insensitive...
    @sorted = sort {lc $^b.name cmp lc $^a.name}
                   @unsorted;
    # or...
    @sorted = sort {$^b.name cmp $^a.name} is insensitive
                   @unsorted;
    # or...
    @sorted = sort {$^a.name cmp $^b.name} is descending is insensitive
                   @unsorted;

    # Modtimewise numerically ascending...
    @sorted = sort {-M $^a <=> -M $^b} @unsorted;

    # Fuzz-ifically...
    sub fuzzy_cmp($x, $y) returns Int;
    @sorted = sort &fuzzy_cmp, @unsorted;

or with a single one-argument block/closure (to sort according whatever the specified key extractor returns):

    # Numerically ascending...
    @sorted = sort {+ $^elem} @unsorted;
    @sorted = sort {+ $_} @unsorted;

    # Namewise stringifically descending case-insensitive...
    @sorted = sort {~ $^elem.name} is descending is insensitive @unsorted;
    @sorted = sort {lc $^elem.name} is descending @unsorted;
    @sorted = sort {lc .name} is descending @unsorted;

    # Modtimewise numerically ascending...
    @sorted = sort {-M} @unsorted;

    # Key-ifically...
    sub get_key($elem) {...}
    @sorted = sort &get_key, @unsorted;

or with a single extractor/comparator pair (to sort according to the extracted key, using the specified comparator):

    # Modtimewise stringifically descending...
    @sorted = sort {-M}=>{$^b cmp $^a} @unsorted;

    # Namewise fuzz-ifically...
    @sorted = sort {.name}=>&fuzzy_cmp @unsorted;

or with an array of comparators and/or key extractors and/or extractor-comparator pairs (to sort according to a cascading list of criteria):

    # Numerically ascending
    # or else namewise stringifically descending case-insensitive
    # or else modtimewise numerically ascending
    # or else namewise fuzz-ifically
    # or else fuzz-ifically...
    @sorted = sort [ {+ $^elem},
                     {$^b.name cmp $^a.name} is insensitive,
                     {-M},
                     {.name}=>&fuzzy_cmp,
                     &fuzzy_cmp,
                   ],
                   @unsorted;

If a key-extractor block returns number, then C<< <=> >> is used to compare those keys. Otherwise C<cmp> is used. In either case, the keys extracted by the block are cached within the call to C<sort>, to optimize subsequent comparisons against the same element. That is, a key-extractor block is only ever called once for each element being sorted.

If a key-extractor/comparator pair is specified, the key-extractor is the key of the pair and the comparator the value. The extractor is used to retreive keys, which are then passed to the comparator.

The C<is descending> and C<is insensitive> traits on a key extractor or a comparator are detected within the call to C<sort> (or possibly by the compiler) and used to modify the case-sensitivity and "direction" of any comparison operators used for the corresponding key or in the corresponding comparator.

Note that ambiguous cases like:

    @sorted = sort {-M}, {-M}, {-M};
    @sorted = sort {$^a <=> $^b}, {$^a <=> $^b}, {$^a <=> $^b};
    @sorted = sort [...], [...], [...];
    # etc.

will be dispatched according to the normal multiple dispatch semantics
(which will mean that they will mean):

    @sorted = sort {-M}          <== {-M}, {-M};
    @sorted = sort {$^a <=> $^b} <== {$^a <=> $^b}, {$^a <=> $^b};
    @sorted = sort [...]         <== [...], [...];
    # etc.

and so one would need to write:

    @sorted = sort <== {-M}, {-M}, {-M};
    @sorted = sort <== {$^a <=> $^b}, {$^a <=> $^b}, {$^a <=> $^b};
    @sorted = sort <== [...], [...], [...];
    # etc.

to get C<cmp> comparison on all the arguments.

-----cut---------cut---------cut---------cut---------cut---------cut----

Thanks to everyone who contributed to this discussion (especially Uri). As you see, the result is sort facility that is simultaneously much more powerful, much easier-to-use in the simple cases, has the potential to produce more maintainable code, and which has much better scope for internal optimization.

"Once again the Iron Designer rises to the supreme challenge of the Mailinglist Stadium and expresses the true spirit of Perl 6!!!"

Zenzai!

Damian

The Sort Problem: a definitive ruling

Reply via email to