Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

Tim Peters Mon, 09 Mar 2015 21:29:18 -0700

[nha pham <phq...@gmail.com>]
> Statement_1: With an array of size N or less than N, we need at most log2(N)
> comparisons to find a value
> (or a position, incase the search miss), using the binary search algorithm.
>
> proof: This statement is trivia, and I believe, someone outthere already
> proved it.


Sorry for the quick message here.  It's just a simple point where it
will pay not to get off on a wrong foot ;-)

Correct:  for an array of size N, binary search can require as many as
ceiling(log2(N+1)) comparisons.

That's because there are N+1 possible results for an array of size N.
For example, for an array of size 3, [A, B, C], "the answer" may be
"before A", "between A and B", "between B and C", or "after C".  3
elements, 3+1 = 4 possible results.  log2(3) comparisons are not
enough to distinguish among 4 results.

Make it trivial, an array of length 1.  Then 1 comparison is obviously
necessary and sufficient in all cases.  And, indeed,
ceiling(log2(1+1)) = 1.  log2(1) equals 0, too small.

For the rest, I haven't been able to understand your description or
your pseudo-code.  I'll try harder.  Some things clearly aren't doing
what you _intend_ them to do.  For example, in your Python code, each
time through the outer loop you're apparently trying to sort the next
CHUNK elements, but you end up appending CHUNK+1 values to data2 (or
data3).

Or in this part:

    for i in range(low,high):
        x = data[i]
        if x >= data[i-1]:

the first time that loop is executed low == 0, and so i == 0 on the
first iteration, and so the conditional is

       if x >= data[0-1]

That's referencing data[-1], which is the very last element in data -
which has nothing to do with the CHUNK you're trying to sort at the
time.

So there are a number of errors here, which makes it that much harder
to sort out (pun intended <wink>) what you're trying to do.  It would
help you to add some asserts to verify your code is doing what you
_hope_ it's doing.  For example, add

    assert data2[low: high] == sorted(data[low: high])
    assert len(data2) == high

to the end of your `sample` loop, and similarly for data3 in your
`new` loop.  Until those asserts all pass, you're not testing code
that's actually sorting correctly.  Repair the errors and you almost
certainly won't find `new` running over 10 times faster than `sample`
anymore.  I don't know what you _will_ discover, though.  If the code
doesn't have to sort correctly, there are much simpler ways to make it
run _very_ much faster ;-)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

Reply via email to