On Sun, Sep 11, 2016 at 5:45 PM, Elliot Gorokhovsky <elgo8...@colorado.edu> wrote: > I am interested in making a non-trivial improvement to list.sort(), but > before I put in the work, I want to test the waters and see if this is > something the community would accept. Basically, I want to implement radix > sort for lists of strings. So list.sort() would detect if it is sorting a > list of strings (which is one of the more common things you sort in python) > and, if so, use in-place radix sort (see > https://xlinux.nist.gov/dads/HTML/americanFlagSort.html). In-place radix > sort is significantly faster for lexicographic sorting than Timsort (or in > general any comparison-based sort, since radix can beat the nlogn barrier). > If you don't believe that last claim, suppose for the sake of the argument > that it's true (because if I actually implemented this I could supply > benchmarks to prove it).
I'd like to see these benchmarks, actually. Sounds interesting. How does it fare on different distributions of data - for instance, strings consisting exclusively of ASCII digits and punctuation eg "01:12:35,726 --> 01:12:36,810", or strings consisting primarily of ASCII but with occasional BMP or astral characters, or strings primarily of Cyrillic text, etc? What if every single string begins with a slash character (eg if you're sorting a bunch of path names)? At what list size does it become visibly faster? Could this be put onto PyPI and then benchmarked as lst.sort() vs flagsort(lst) ? ChrisA _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com