[issue45530] Improve listobject.c's unsafe_tuple_compare()

Stefan Pochmann Fri, 22 Oct 2021 13:28:06 -0700


Stefan Pochmann <stefan.pochm...@gmail.com> added the comment:


> It's not surprising the 2nd positions are rarely equal _given_ that the 
> universe has been reduced to the 100 tuples with the same first element.

Yes, exactly.

> In any case, I don't see the point to this exercise ;-)

Just to illustrate why I disagree with what I responded to there. I'm not 
convinced that many matches at the first tuple position are really an 
indication that there'll be many matches at the second position as well, so 
that one should "admit defeat" and that it's "arrogance" to hope for only few 
matches at the second position. I'd wager it's rather typical that the matching 
rate at the second position is significantly lower. If you sort people by last 
and then first name, I think you'd get the same effect as with my above random 
example, for the same reason. If you sort a set of 2D coordinates, you'll even 
*never* get matches at the second position. Even with that SO question's data 
and its massive duplication (especially at the second position) *and* its 
correlation between first and second position (since both "sorted(x)" and 
"x[::2]" are derived from the same x) and its 43% matching rate at the first 
position, there's still only 27% matching rate at the second position (lower tha
 n the 1/3 break-even point).

> BTW, don't overlook that tuple_elem_compare can _only_ be used on the very 
> first elements of the tuples.

Yes, that's why I only talked about PyObject_RichCompareBool there.

> > 1) Sort the list only by first position (comparing with only
> >    one tuple_elem_compare).

> Not sure what that means, exactly. (the "with only one tuple_elem_compare" 
> part;

Just wanted to make clear it wouldn't used two tuple_elem_compare or a 
PyObject_RichCompareBool.

> > 2) Identify equal-first-position-runs (with tuple_elem_compare)
> > and sort each run independently (comparing only positions 1+).

> What are you aiming at? There was no motivation given here.

Much fewer comparisons, especially PyObject_RichCompareBool comparisons. For 
example for sorting a million pairs with first/second element chosen from a 
population of 100,000, I get these numbers of comparisons (per element):

            current  groupsort
------------------------------
== first     18.60      none        (PyObject_RichCompareBool)
 < first     16.19     19.60        (tuple_elem_compare)
== second     2.41      2.36        (PyObject_RichCompareBool)
 < second     2.41      2.36        (PyObject_RichCompareBool)
------------------------------
total slow   23.42      4.72        (PyObject_RichCompareBool)
total fast   16.19     19.60        (tuple_elem_compare)
------------------------------
total        39.61     24.32

(With "current" I mean before your optimizations, which I can't test yet.)

> Will, this issue report is about unsafe_tuple_compare(), so, ya, if the 
> changes you envision aren't limited to that, I'd say you want to open a 
> different issue report and a different PR :-)

Ok, I might try that.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45530>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45530] Improve listobject.c's unsafe_tuple_compare()

Reply via email to