On 7/21/2010 6:56 PM, Daniel Stutzbach wrote:
On Wed, Jul 21, 2010 at 5:27 PM, Terry Reedy <tjre...@udel.edu
<mailto:tjre...@udel.edu>> wrote:

    These tests use random numbers with a constant, relatively high
    density of 25%, which is favorable to radix sort. What if you do the
    same test with a constant range of, say, 1000000000 (1 billion) or
    even a trillion or quadrillion. Or how about sorting a thousand
    128bit ip6 addresses? Which wins for that?

blist switches to radix sort only if the keys contain only floats or
only integers that fit into a C long.  If the integers get too big, it
reverts to timsort.

When using a sort key, the decorate-sort-undecorate pattern requires
touching every object once before the sort.  The check for a consistent
type is done inexpensively during the decorate phase.

And it is pretty cheap even when there is no decorate phase.

Here's an example result with a density of only 1%, where the radix sort
is around 4 times as fast:

otto:~/src/blist$ python3.1 -m timeit -s 'import random' -s 'x =
[random.randrange(10000*100) for i in range(10000)]' 'y = list(x)'
'y.sort(key=float)'
100 loops, best of 3: 12.4 msec per loop

otto:~/src/blist$ python3.1 -m timeit -s 'from blist import blist' -s
'import random' -s 'x = [random.randrange(10000*100) for i in
range(10000)]' 'y = blist(x)' 'y.sort(key=float)'
100 loops, best of 3: 3.4 msec per loop

And a density of 0.01%:

otto:~/src/blist$ python3.1 -m timeit -s 'import random' -s 'x =
[random.randrange(10000*10000) for i in range(10000)]' 'y = list(x)'
'y.sort(key=float)'
100 loops, best of 3: 12 msec per loop

otto:~/src/blist$ python3.1 -m timeit -s 'from blist import blist' -s
'import random' -s 'x = [random.randrange(10000*10000) for i in
range(10000)]' 'y = blist(x)' 'y.sort(key=float)'
100 loops, best of 3: 3.52 msec per loop

Looks good so far. I would like to see that repeated all the way down to range(10) to make sure people doing millions of small sorts were not getting screwed.

    list.sort is (near) linear for lists that are mostly ordered. I
    think Tim's writeup about this in in the source. For instance, if
    one extends a sorted with 1% random additions and resorts, list.sort
    will skip the sorted part, sort the additions, and merge them back
    in. Radix sort ignores pre-existing order. So either a lot of
    comparitive profiling would be needed for auto-selection of sort
    method, or it should be user selectable.
I've tested exactly that scenario.  For already-ordered lists, radix
sort and timsort tie.

Tim tested about 7 list structure scenarios with a range of lengths. If you post a patch, I would request that you do the same.

Have you run a patched version against test_sort.py? I believe it mostly tests lists of small ints, so radix methods would mostly be switched in.

If it were added and the switching were internal, new test cases would be needed to test test timsort.

    Does your radix sort meet the stability guarantee of list.sort?
Yes.

Great. There is, of course, a test for that in the suite.

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to