[issue45530] Improve listobject.c's unsafe_tuple_compare()

Tim Peters Thu, 21 Oct 2021 18:24:24 -0700


Tim Peters <t...@python.org> added the comment:


> I see you mentioned that PyObject_RichCompareBool(..., Py_EQ) might be
> faster for example because it checks identity.

For example, in tupsort.py replace

    xs = [random() for _ in range(length)]

with

    xs = ['z' * 100 for _ in range(length)]

Then sorting that directly is _slower_ than wrapping each string in a 1-tuple 
first. That's so today, and also so in the latest version of the PR (but not in 
the original PR code).

> Why not do an identity check before the ms->tuple_elem_compare calls? Too
> expensive and rarely successful?

It's a cheap test, but, ya, "rarely successful" outside of special cases. 
PyObject_RichCompareBool() already does that special-casing now, and the PR now 
adjusts to use PyObject_RichCompareBool(Py_EQ) instead when the pair of 
ms->tuple_elem_compare calls isn't paying off. It's enough that one part of the 
chain rarely pays off, but gets nearly all the potential benefit when it does 
pay off. Duplicating the special-casing would double the overhead with scant 
additional payoff when it pays.

> I sorted a million tuples where both first and second element
> are randomly chosen from a population of 10,000.

So you expect about 100 repetitions of each value in each tuple position.

> So their amounts of duplication were the same. But these are the
> statistics from sorting:
> - First position:   18,603,981 equality comparisons, 29.87% equal
> - Second position:   5,556,203 equality comparisons,  0.09% equal

Well, for any given fixed value in the first tuple position, you expect about 
100 tuples total to have that value in the first position, and the second 
position in those contains a random sample (with repetition) of 100 elements 
out of 10,000. It's not surprising the 2nd positions are rarely equal _given_ 
that the universe has been reduced to the 100 tuples with the same first 
element.

In any case, I don't see the point to this exercise ;-)

BTW, don't overlook that tuple_elem_compare can _only_ be used on the very 
first elements of the tuples. The pre-scan developed no idea whatsoever of the 
types of tuple elements other than the first.


> One more idea: What do you think of sorting lists of tuples
> (or tuple keys) in two stages?

Primarily that I'm not looking for a term project here - I'm looking to get an 
easy win from simple changes to one function.

What's there in the PR now is designed to play well with how sorting works. 
When there are many duplicates, a merge will find about 7 comparison attempts 
in a row where the first elements are equal, and the code adjusts to use 
PyObject_RichCompareBool(Py_EQ) for at least all but the first compare. 
Galloping mode will continue with that for a brief time at the start, but 
probably soon hit a chain of cases all of which can be resolved solely with 
tuple_elem_compare calls, and the code adjusts to that too, never using 
PyObject_RichCompareBool(Py_EQ) again after the first time it sees that return 
"no, not equal".

OTOH, when there are no duplicates, PyObject_RichCompareBool(Py_EQ) isn't 
helpful, and in fact will never be called.

> 1) Sort the list only by first position (comparing with only one
>    tuple_elem_compare).

Not sure what that means, exactly. (the "with only one tuple_elem_compare" 
part; I know what it means to sort by the first position)

> 2) Identify equal-first-position-runs (with tuple_elem_compare) and
> sort each run independently (comparing only positions 1+).

What are you aiming at? There was no motivation given here.

> Some experiments I did with this looked good, not sure if too
> off-topic to post here...

Will, this issue report is about unsafe_tuple_compare(), so, ya, if the changes 
you envision aren't limited to that, I'd say you want to open a different issue 
report and a different PR :-)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45530>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45530] Improve listobject.c's unsafe_tuple_compare()

Reply via email to