>>>>> "DC" == Damian Conway <[EMAIL PROTECTED]> writes:
DC> Suppose C<sort>'s signature is:
DC> type KeyExtractor ::= Code(Any) returns Any;
DC> type Comparator ::= Code(Any, Any) returns Int;
DC> type Criterion ::= KeyExtractor
DC> | Comparator
DC> | Pair(KeyExtractor, Comparator)
DC> ;
DC> type Criteria ::= Criterion
DC> | Array of Criterion
DC> ;
DC> multi sub *sort(Criteria ?$by = {$^a cmp $^b}, [EMAIL PROTECTED]) {...}
DC> Each of those sort criteria may be a block/closure, or a pair of
DC> block/closures. Each block/closure may take either one argument (in
DC> which case it's a key extractor) or two arguments (in which case it's
DC> a comparator).
DC> If a key-extractor block returns number, then C<< <=> >> is used to
DC> compare those keys. Otherwise C<cmp> is used. In either case, the
DC> returned keys are cached to optimize subsequent comparisons against
DC> the same element.
i would make cmp the default as it is now. you sort strings much more
often than you sort numbers.
DC> If a key-extractor/comparator pair is specified, the key-extractor is
DC> the key of the pair and the comparator the value. The extractor is
DC> used to retreive (and cache) keys, which are then passed to the
DC> comparator.
DC> Which means that (a slightly extended version of) Uri's proposed:
DC> @out = sort
DC> key( %lookup{ .{remotekey} } ), #1
DC> key:insensitive:descending:locale( 'somewhere' )( .{priority} ), #2
DC> key:float ( substr( 0, 10 ) ), #3
DC> key:integer ( /foo(\d+)bar/ ), #4
DC> key:compare( { ^$a <=> ^$b } )( /(\d+)$/ ), #5
DC> key:compare( \&my_compare_sub ) ( /(\d+)$/ ), #6
DC> key:compare( { just_guess $^b, $^a } ), #7
DC> @in;
DC> would become:
DC> @out = sort
DC> [ { ~ %lookup{ .{remotekey} } }, #1
if string cmp is the default, wouldn't that ~ be redundant?
and i keep forgetting that ~ is now the stringify operator :)
DC> { use locale 'somewhere'; lc .{priority} } is descending, #2
hmmm, this needs more work IMO. requiring the coder to lc the key is
moving the case insensitivity feature back into code. wouldn't 'is
insensitive' be ok?
how does the internal guts get the descending/insensitive flags/traits
passed to it? i know it is an internals problem but i am just pondering
it.
DC> { + substr( 0, 10 ) }, #3
DC> { int /foo(\d+)bar/ }, #4
i would also expect int to be a default over float as it will be used
more often. + is needed there since the regex returns a string. in the
#3 case that would be an int as well. so we need a 'float' cast
thingy. BTW, the only way to get a number as a key is from a structure
where the field was assigned as a number/int. that may not happen a lot
so the int/float cast will probably be needed here to sort correctly
DC> { + m/(\d+)$/.[1] }, #5
DC> { /(\d+)$/ } => &my_compare_sub, #6
missing a close } it seems.
DC> { just_guess $^b, $^a }, #7
is that a reverse order sort? why not skip the args and do this:
{ &just_guess is descending }, #7
DC> ],
so the first arg to sort is either a single compare block or a anon list
of them? i figure we need the [] to separate the criteria from the input
data. but what about this odd case,
sort [...], [...], [...]
now that is stupid code but it could be trying to sort the refs by their
address in string mode. or it could be a sort criteria list followed by
2 refs to input records.
DC> @in;
DC> So to specify a key extractor we provide a block with one argument, as
DC> in examples #1 through #5. Then to specify that the key is compared
DC> with C<cmp> we return a string (using unary C<~> to be explicit about
DC> it) as in #1. To specify that the key is compared with C<< <=> >> we
DC> return a number (using unary C<+> to be explicit about it) as in #3
DC> and #5. If we think that C<sort> might be able to optimize integer
DC> comparisons, we can explicitly return an integer, as in #4. If we want
DC> locale-sensitive sorting, we specify that with the appropriate C<use
locale> statement, as in #2.
DC> To specify a comparator, we provide a block with two arguments, as in
DC> #7. That block is always expected to return an integer.
so #7 is a call to just_guess which is passed the 2 args to compare. it
must return an int like cmp/<=>. as i pointed out above, i don't see why
you even need to show the ^$a and ^$b args? they will be passed into
just_guess that way. let is descending handle the sort ordering.
DC> To specify a key-extractor *and* an associated comparator, we bind
DC> them together in a Pair, as in #6.
that works for me.
DC> The only tricky bits are how to specify a key extractor for keys that
DC> are to be sorted in descending order and/or case-insensitively. The
DC> case insensitivity could be handled by simple C<lc>'ing, as in
DC> #2. Descending sort order could be handled by a trait on the
DC> block/closure, as in #2. Alternatively, case-insensitivity could be
DC> specified by trait as well.
ok, i like a trait better than coding in lc. better semantics and
clearer to read. the is descending trait could just cause the sort
callbacks to be called with reversed arguments and so the callback code
never has to worry about that.
DC> Note that simple sorts would be unchanged:
DC> @sorted = sort @unsorted;
DC> @sorted = sort {$^b <=> $^a} @unsorted;
DC> But now one could also very neatly code cases that formerly required
DC> an Orcish or Transformed sort. For example, these:
DC> @sorted = sort {(%M{$^a}//-M $^a) <=> (%M{$^b}//-M $^b)} @unsorted;
wow, that is UGLY! but i get it after a few hours of study. :) just the
orcish maneuver but with //. i think you also mean //= there.
DC> @sorted = map $_[1], sort {$^a[0] <=> $^b[0]}, map [-M,$_],
DC> @unsorted;
the familiar ST.
DC> would both become:
DC> @sorted = sort {-M} @unsorted;
that still wants to be cached somehow as -M is expensive. but that is an
internal thing and the ST/GRT both can handle it. so -M there is a
simple key extraction on the files in @unsorted. assuming no internal
caching would this be correct?
@sorted = sort {%M{$_} //= -M} @unsorted;
i assume //= will be optimized and -M won't be called if it is cached.
also where does %M get declared and/or cleared before this? can it be
done in the block:
@sorted = sort {my %M ; %M{$_} //= -M} @unsorted;
another -M problem is that it currently returns a float so that must be
marked/cast as a float.
@sorted = sort {float -M} @unsorted;
that cast causes a <=> compare (since i would rather see cmp be the
default) and also can let the internal GRT properly merge a float key
instead of an int. it should work without it (not sure how) but the
float marker is important to optimization and a proper comparison.
maybe the fact that the compiler knows -M returns a float can be used to
mark it internally and the explicit float isn't needed here. but data
from a user record will need to be marked as float as the compiler can't
tell.
anyhow, i am glad i invoked your name and summoned you into this
thread. :)
uri
--
Uri Guttman ------ [EMAIL PROTECTED] -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org