On 7/21/2010 6:56 PM, Daniel Stutzbach wrote:
On Wed, Jul 21, 2010 at 5:27 PM, Terry Reedy <tjre...@udel.edu
<mailto:tjre...@udel.edu>> wrote:
These tests use random numbers with a constant, relatively high
density of 25%, which is favorable to radix sort. What if you do the
same test with a constant range of, say, 1000000000 (1 billion) or
even a trillion or quadrillion. Or how about sorting a thousand
128bit ip6 addresses? Which wins for that?
blist switches to radix sort only if the keys contain only floats or
only integers that fit into a C long. If the integers get too big, it
reverts to timsort.
When using a sort key, the decorate-sort-undecorate pattern requires
touching every object once before the sort. The check for a consistent
type is done inexpensively during the decorate phase.
And it is pretty cheap even when there is no decorate phase.
Here's an example result with a density of only 1%, where the radix sort
is around 4 times as fast:
otto:~/src/blist$ python3.1 -m timeit -s 'import random' -s 'x =
[random.randrange(10000*100) for i in range(10000)]' 'y = list(x)'
'y.sort(key=float)'
100 loops, best of 3: 12.4 msec per loop
otto:~/src/blist$ python3.1 -m timeit -s 'from blist import blist' -s
'import random' -s 'x = [random.randrange(10000*100) for i in
range(10000)]' 'y = blist(x)' 'y.sort(key=float)'
100 loops, best of 3: 3.4 msec per loop
And a density of 0.01%:
otto:~/src/blist$ python3.1 -m timeit -s 'import random' -s 'x =
[random.randrange(10000*10000) for i in range(10000)]' 'y = list(x)'
'y.sort(key=float)'
100 loops, best of 3: 12 msec per loop
otto:~/src/blist$ python3.1 -m timeit -s 'from blist import blist' -s
'import random' -s 'x = [random.randrange(10000*10000) for i in
range(10000)]' 'y = blist(x)' 'y.sort(key=float)'
100 loops, best of 3: 3.52 msec per loop
Looks good so far. I would like to see that repeated all the way down to
range(10) to make sure people doing millions of small sorts were not
getting screwed.
list.sort is (near) linear for lists that are mostly ordered. I
think Tim's writeup about this in in the source. For instance, if
one extends a sorted with 1% random additions and resorts, list.sort
will skip the sorted part, sort the additions, and merge them back
in. Radix sort ignores pre-existing order. So either a lot of
comparitive profiling would be needed for auto-selection of sort
method, or it should be user selectable.
I've tested exactly that scenario. For already-ordered lists, radix
sort and timsort tie.
Tim tested about 7 list structure scenarios with a range of lengths. If
you post a patch, I would request that you do the same.
Have you run a patched version against test_sort.py? I believe it mostly
tests lists of small ints, so radix methods would mostly be switched in.
If it were added and the switching were internal, new test cases would
be needed to test test timsort.
Does your radix sort meet the stability guarantee of list.sort?
Yes.
Great. There is, of course, a test for that in the suite.
--
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list