Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

Tim Peters Tue, 10 Mar 2015 10:26:37 -0700

OK - here's what the current binsort does, ignoring that it skips an
already-sorted prefix (if any), and creating a new list instead of
modifying in-place:


    def oldsort(a):
        from bisect import bisect_right as br
        assert a
        result = [a[0]]
        for i in range(1, len(a)):
            x = a[i]
            index = br(result, x)
            result.insert(index, x)
        return result

And here's my best guess as to what you _intend_ the new version to
do.  Please confirm that, or, if I'm guessing wrong, please give a
Python function that _does_ implement your intent:

    def newsort(a):
        from bisect import bisect_right as br
        assert a
        oldx = a[0]
        result = [oldx]
        index = 0
        for i in range(1, len(a)):
            x = a[i]
            if x < oldx:
                index = br(result, x, 0, index)
            else:
                index = br(result, x, index + 1)
            result.insert(index, x)
            oldx = x
        return result

Now assuming that's right, I don't care about timing it ;-)  The only
basic question to me is whether it in fact reduces the number of
comparisons.  So here's an integer wrapper that bumps a global counter
whenever it's asked to compare:

    class IntWrap(object):
        def __init__(self, i):
            self.i = i

        def __cmp__(a, b):
            global gncmp
            gncmp += 1
            return cmp(a.i, b.i)

        def __repr__(self):
            return repr(self.i)

Now we can build lists containing that, and get exact comparison
counts.  To start, for a given length `n`, this counts the total
number of comparisons needed to sort all possible permutations of a
list of length n, under both the old and new ways:

    def drive(n):
        import itertools
        global gncmp
        base = [IntWrap(i) for i in range(n)]
        oldcount = newcount = 0
        numperms = 0
        for p in itertools.permutations(base):
            numperms += 1

            gncmp = 0
            oldresult = oldsort(p)
            oldcount += gncmp

            gncmp = 0
            newresult = newsort(p)
            newcount += gncmp

            assert oldresult == newresult == base
        print 'n', n, 'perms', numperms
        print 'old compares', oldcount
        print 'new compares', newcount
        print 'diff %', (newcount - oldcount) * 100.0 / max(oldcount, 1)

And, finally, a tiny loop to drive it:

    for n in range(1, 13):
        print
        drive(n)

It's still running as I type this, but the results aren't promising so
far - as soon as the list length gets non-trivial, the new way
requires more comparisons than the old way so far:

n 1 perms 1
old compares 0
new compares 0
diff % 0.0

n 2 perms 2
old compares 2
new compares 2
diff % 0.0

n 3 perms 6
old compares 16
new compares 16
diff % 0.0

n 4 perms 24
old compares 112
new compares 116
diff % 3.57142857143

n 5 perms 120
old compares 848
new compares 880
diff % 3.77358490566

n 6 perms 720
old compares 7008
new compares 7296
diff % 4.1095890411

n 7 perms 5040
old compares 63456
new compares 66432
diff % 4.68986384266

n 8 perms 40320
old compares 628608
new compares 662496
diff % 5.39095907147

n 9 perms 362880
old compares 6826752
new compares 7202304
diff % 5.50118123523

n 10 perms 3628800
old compares 80605440
new compares 85006080
diff % 5.45948263542

I believe it would be very difficult to analyze this rigorously - and
even if I saw an analysis it would be hard to trust it.  Raw counts
from simple code are hard to argue with ;-)

FYI, here are two ideas I had way back when, but didn't pursue:

1. Merge "2 at a time" instead of just 1.  That is, first "sort" the
next 2 elements to be merged (1 compare and a possible swap).  Then
binary search to find where the smaller belongs, and a shorter binary
search to find where the larger belongs.  Then shift both into place.
This can win on two counts:

A. Less data movement, since the already-sorted values after the
larger element get moved only once instead of twice.

B. A possible cut in number of compares.  Merging a sorted list of N
elements with a sorted list of 2 elements has been studied a lot
(e.g., search for "optimal merging of 2 elements" and find the paper
by Hwang and Lin).  The minimum average theoretical number of compares
needed is
ceiling(log2((N+2)*(N+1)/2)).

2. Instead of binary insertion sort, do an ordinary (but optimized)
bottom-up merge sort.  That may not cut the number of compares, but
would slash worst-case data movement cost from O(n**2) to O(n*log(n)).

As to why your code is sometimes faster, for the Python code in your
timing harness, well, that didn't actually sort anything, so wasn't
measuring anything interesting (or even explainable ;-) ).  For the
Java code, I have no guess - I don't know enough about Java internals.
Maybe "lucky" data, maybe cache effects, maybe a mistake - don't know,
and can't guess.  Or maybe my guess (above) at the intent of your code
is all wrong.

It _is_ an interesting idea, though!  Thanks for bringing it up :-)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

Reply via email to