Just submitted a PR implementing this:
https://github.com/python/cpython/pull/582
-- just need someone to review it now :)

Thanks for all your feedback, everyone!

On Sun, Mar 5, 2017 at 12:19 AM Elliot Gorokhovsky <
elliot.gorokhov...@gmail.com> wrote:

> (Summary of results: my patch at https://bugs.python.org/issue28685 makes
> list.sort() 30-50% faster in common cases, and at most 1.5% slower in the
> uncommon worst case.)
>
> Hello all,
>
> You may remember seeing some messages on here about optimizing list.sort()
> by exploiting type-homogeneity: since comparing apples and oranges is
> uncommon (though possible, i.e. float to int), it pays off to check if the
> list is type-homogeneous (as well as homogeneous with respect to some other
> properties), and, if it is, to replace calls to PyObject_RichCompareBool
> with calls to ob_type->tp_richcompare (or, in common special cases, to
> optimized compare functions). The entire patch is less than 250 lines of
> code, most of which is pretty boilerplate (i.e. a lot of assertions in
> #ifdef Py_DEBUG blocks, etc).
>
> I originally wrote that patch back in November. I've learned a lot since
> then, both about CPython and about mailing list etiquette :). Previous
> discussion about this can be found at
> https://mail.python.org/pipermail/python-dev/2016-October/146648.html and
> https://mail.python.org/pipermail/python-ideas/2016-October/042807.html.
>
> Anyway, I recently redid the benchmarks much more rigorously (in
> preparation for presenting this project at my school's science fair),
> achieving a standard deviation of less than 0.5% of the mean for all
> measurements. The exact benchmark script used can be found at
> https://github.com/embg/python-fastsort-benchmark (it's just sorting
> random lists of/lists of tuples of [type]. While listsort.txt talks about
> benchmarking different kinds of structured lists, instead of just random
> lists, the results here would hold in those cases just as well, because
> this makes individual comparisons cheaper, instead of reducing the number
> of comparisons based on structure).
>
> I also made a poster describing the optimization and including a pretty
> graph displaying the benchmark data:
> https://github.com/embg/python-fastsort-benchmark/blob/master/poster.pdf.
> For those who would rather read the results here (though it is a *really*
> pretty graph):
>
> ***
> Percent improvement for sorting random lists of [type]
> (1-patched/unpatched):
> float: 48%
> bounded int (magnitude smaller than 2^32): 48.4%
> latin string (all characters in [0,255]): 32.7%
> general int (reasonably uncommon?): 17.2%
> general string (reasonably uncommon?): 9.2%
> tuples of float: 63.2%
> tuples of bounded int: 64.8%
> tuples of latin string: 55.8%
> tuples of general int: 50.3%
> tuples of general string: 44.1%
> tuples of heterogeneous: 41.5%
> heterogeneous (lots of float with an int at the end; worst-case): -1.5%
> ***
>
> Essentially, it's a gamble where the payoff is 20-30 times greater than
> the cost, and the odds of losing are very small. Sorting is perhaps not a
> bottleneck in most applications, but considering how much work has gone
> into Python's sort (Timsort, etc; half of listobject.c is sort code), I
> think it's interesting that list.sort() can be made essentially twice
> faster by a relatively simple optimization. I would also add that Python
> dictionaries already implement this optimization: they start out optimizing
> based on the assumption that they'll only be seeing string keys, checking
> to make sure that assumption holds as they go. If they see a non-string
> key, they permanently switch over to the general implementation. So it's
> really the same idea, except here it doesn't matter as much what type we're
> dealing with, which is important, because lists are commonly used with lots
> of different types, as opposed to dictionaries, which overwhelmingly
> commonly use string keys, especially internally. (Correct me if I'm wrong
> in any of the above).
>
> I got a lot of great feedback/input on this patch as I was writing it, but
> after submitting it, I didn't hear much from anybody. (The reason I took so
> long to post was because I wanted to wait until I had the chance to do the
> benchmarks *right*). What do you all think?
>
> Thanks,
> Elliot
>
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to