Stefan Pochmann <[email protected]> added the comment:
> It's not surprising the 2nd positions are rarely equal _given_ that the
> universe has been reduced to the 100 tuples with the same first element.
Yes, exactly.
> In any case, I don't see the point to this exercise ;-)
Just to illustrate why I disagree with what I responded to there. I'm not
convinced that many matches at the first tuple position are really an
indication that there'll be many matches at the second position as well, so
that one should "admit defeat" and that it's "arrogance" to hope for only few
matches at the second position. I'd wager it's rather typical that the matching
rate at the second position is significantly lower. If you sort people by last
and then first name, I think you'd get the same effect as with my above random
example, for the same reason. If you sort a set of 2D coordinates, you'll even
*never* get matches at the second position. Even with that SO question's data
and its massive duplication (especially at the second position) *and* its
correlation between first and second position (since both "sorted(x)" and
"x[::2]" are derived from the same x) and its 43% matching rate at the first
position, there's still only 27% matching rate at the second position (lower tha
n the 1/3 break-even point).
> BTW, don't overlook that tuple_elem_compare can _only_ be used on the very
> first elements of the tuples.
Yes, that's why I only talked about PyObject_RichCompareBool there.
> > 1) Sort the list only by first position (comparing with only
> > one tuple_elem_compare).
> Not sure what that means, exactly. (the "with only one tuple_elem_compare"
> part;
Just wanted to make clear it wouldn't used two tuple_elem_compare or a
PyObject_RichCompareBool.
> > 2) Identify equal-first-position-runs (with tuple_elem_compare)
> > and sort each run independently (comparing only positions 1+).
> What are you aiming at? There was no motivation given here.
Much fewer comparisons, especially PyObject_RichCompareBool comparisons. For
example for sorting a million pairs with first/second element chosen from a
population of 100,000, I get these numbers of comparisons (per element):
current groupsort
------------------------------
== first 18.60 none (PyObject_RichCompareBool)
< first 16.19 19.60 (tuple_elem_compare)
== second 2.41 2.36 (PyObject_RichCompareBool)
< second 2.41 2.36 (PyObject_RichCompareBool)
------------------------------
total slow 23.42 4.72 (PyObject_RichCompareBool)
total fast 16.19 19.60 (tuple_elem_compare)
------------------------------
total 39.61 24.32
(With "current" I mean before your optimizations, which I can't test yet.)
> Will, this issue report is about unsafe_tuple_compare(), so, ya, if the
> changes you envision aren't limited to that, I'd say you want to open a
> different issue report and a different PR :-)
Ok, I might try that.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue45530>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com