Re: The Sort Problem

Damian Conway Sun, 15 Feb 2004 18:09:58 -0800

Uri wrote:

  DC> If a key-extractor block returns number, then C<< <=> >> is used to
  DC> compare those keys. Otherwise C<cmp> is used. In either case, the
  DC> returned keys are cached to optimize subsequent comparisons against
  DC> the same element.

i would make cmp the default as it is now.

Err. That's kinda what "Otherwise C<cmp> is used" means. ;-)

  DC>     @out = sort
  DC>        [ { ~ %lookup{ .{remotekey} } },                                 #1
if string cmp is the default, wouldn't that ~ be redundant?


How do you know that the values of %lookup are strings?
How would the optimizer know?

  DC>          { + substr( 0, 10 ) },                                         #3
  DC>          { int /foo(\d+)bar/ },                                         #4

i would also expect int to be a default over float as it will be used
more often. + is needed there since the regex returns a string. in the
#3 case that would be an int as well. so we need a 'float' cast
thingy.

Unary C<+> *is* the "float cast thingy"!

BTW, the only way to get a number as a key is from a structure
where the field was assigned as a number/int. that may not happen a lot
so the int/float cast will probably be needed here to sort correctly

That's the point.

If you want to force numeric comparison of keys you explicitly cast each key to number using unary C<+> or C<int>. If you want to force stringific comparison you explicitly cast the key to string using unary C<~>.

If you don't explicitly cast either way, C<sort> just DWIMs by looking at the actual type of the keys returned by the extractor. If any of them isn't a number, it defaults to C<cmp>.

  DC>          { + m/(\d+)$/.[1] },                                           #5
  DC>          { /(\d+)$/ } => &my_compare_sub,                               #6

missing a close } it seems.

Yup. Thanks.

DC> { just_guess $^b, $^a }, #7

is that a reverse order sort? why not skip the args and do this:

{ &just_guess is descending }, #7

Because I wanted to show a plain old two-parameter block being used as a *comparator* (not a one-parameter block being used as a key extractor).

so the first arg to sort is either a single compare block or a anon list of them? i figure we need the [] to separate the criteria from the input data.

Yep.

but what about this odd case,

sort [...], [...], [...]
now that is stupid code but it could be trying to sort the refs by their
address in string mode.

In which case we probably should have written it:

sort <== [...], [...], [...]

or it could be a sort criteria list followed by
2 refs to input records.

Only if the first array ref contains nothing but Criterion objects.

  DC> To specify a comparator, we provide a block with two arguments, as in
  DC> #7. That block is always expected to return an integer.
so #7 is a call to just_guess which is passed the 2 args to compare. it must return an int like cmp/<=>.

Yep.

as i pointed out above, i don't see why you even need to show the ^$a and ^$b args?

So the block knows it has two parameters.

they will be passed into just_guess that way. let is descending handle the sort ordering.

But you *can't* apply C<is descending> to a Code reference.

Nor are we sure that the order *is* descending. Maybe the C<just_guess> predicate is invariant to argument order and there were other reasons to pass the args in that order. Or maybe we reversed the order because we know that in C<just_guess> never returns zero, but defaults to its second argument being smaller, in which case we needed to reverse the args so that the C<sort> remained stable.

The point is that I wanted to show a vanilla two-parameter compare block. (And, boy, am I ever sorry I whimsically reversed the args to indicate generality ;-)

DC> @sorted = sort {(%M{$^a}//-M $^a) <=> (%M{$^b}//-M $^b)} @unsorted;

wow, that is UGLY! but i get it after a few hours of study. :) just the orcish maneuver but with //. i think you also mean //= there.

Yup. Should indeed be //=

DC> @sorted = sort {-M} @unsorted;

that still wants to be cached somehow as -M is expensive.

It *is* cached. It's a one-parameter block. So its a key extractor. So it automagically caches the keys it extracts.

so -M there is a simple key extraction on the files in @unsorted.

Yup.

assuming no internal caching

Key extractors will always cache.

@sorted = sort {%M{$_} //= -M} @unsorted;

i assume //= will be optimized and -M won't be called if it is cached.

also where does %M get declared and/or cleared before this?

Exactly the problem. That's why key extractors aways cache.

can it be
done in the block:
@sorted = sort {my %M ; %M{$_} //= -M} @unsorted;

If you'd gone insane and particularly wanted to do it that way, you'd need something like:

@sorted = sort {state %M ; %M{$_} //= -M} @unsorted;

to ensure the cache persisted between calls to the key extractor.

another -M problem is that it currently returns a float so that must be
marked/cast as a float.

@sorted = sort {float -M} @unsorted;

No. *Because* -M returns a number, C<sort> automatically knows to use numeric comparison on those keys.

maybe the fact that the compiler knows -M returns a float can be used to
mark it internally and the explicit float isn't needed here.

Exactly.

but data
from a user record will need to be marked as float as the compiler can't
tell.

It *can* tell if the elements are typed. But, yes, most of the time if you want to ensure numeric comparison you will explicitly prefix with a C<+> to give the compiler a hint. Otherwise C<sort> will have to fall back on looking at the keys that are extracted and working out at run-time which type of comparison to use (kinda like the smartmatch operator does).

anyhow, i am glad i invoked your name and summoned you into this
thread. :)

Well, that makes *one* of us.

;-)

Damian

Re: The Sort Problem

Reply via email to