Tim Peters <t...@python.org> added the comment:

It's rare that an optimization is a _pure_ win. Some cases win, others lose. 
There's almost never "a proof" of net gain ending with "QED".

Of course it's dead easy to construct examples where "many duplicates in the 
first tuple position" rules, and even easy to construct realistic examples.

But, as you already said in the SO report, in those cases it's a better idea to 
do multiple single-key sorts instead of a single multiple-keys-in-a-tuple sort. 
The base sorting algorithm can exploit duplicates in a single-key sort - 
indeed, the more duplicates there are, the more benefit it can get.

The sorting HOWTO could do a service by discussing this for CPython's sorting 
algorithm - it's not obvious, and can have major effects.

In the meantime, I'm only aiming to speed tuple keys for cases where they're 
well-suited to begin with: the first tuple elements _are_ mostly distinct. 
Giving up a significant win for the relative handful of people who know how to 
exploit the algorithm is not worth it, to me, to avoid making suboptimal uses 
even less optimal.

I'm more a pragmatist than a Utopian :-;

Extending the idea to positions beyond the first is dubious on those grounds: 
if the first tuple positions in fact often match, then the optimization has 
already backfired. Time to admit defeat then, not double down on failed 
arrogance ;-)

One idea I'm playing with: keep track of how often, during a tuple-key sort, a 
compare _is_ resolved by the first elements. Then adjust 
`unsafe_tuple_compare()` to switch to "the other" approach when "the current" 
approach isn't working out.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45530>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to