Thank you very much. I am very happy that I got a reply from Tim Peter. You are correct, my mistake.
The python code should be: for i in range(low+1,high): //because we already add data[low] x = data[i] if x >= data[i-1]: After I fix it, here is the result: random array 10^6: Old binsort: 1.3322 New binsort: 1.0015 ratio: 0.33 You are right, it is not ten times faster anymore. I will update other results soon. I do check the result of two sorting methods many times to make sure they are the same. It is just because I do not know how to put assert into the timeit.Timer class. I am pretty sure about this. I will try to write the proof more clearly, sorry for inconvenience. Thank you very much. Nha Pham. On Mon, Mar 9, 2015 at 9:27 PM, Tim Peters <tim.pet...@gmail.com> wrote: > [nha pham <phq...@gmail.com>] > > Statement_1: With an array of size N or less than N, we need at most > log2(N) > > comparisons to find a value > > (or a position, incase the search miss), using the binary search > algorithm. > > > > proof: This statement is trivia, and I believe, someone outthere already > > proved it. > > Sorry for the quick message here. It's just a simple point where it > will pay not to get off on a wrong foot ;-) > > Correct: for an array of size N, binary search can require as many as > ceiling(log2(N+1)) comparisons. > > That's because there are N+1 possible results for an array of size N. > For example, for an array of size 3, [A, B, C], "the answer" may be > "before A", "between A and B", "between B and C", or "after C". 3 > elements, 3+1 = 4 possible results. log2(3) comparisons are not > enough to distinguish among 4 results. > > Make it trivial, an array of length 1. Then 1 comparison is obviously > necessary and sufficient in all cases. And, indeed, > ceiling(log2(1+1)) = 1. log2(1) equals 0, too small. > > For the rest, I haven't been able to understand your description or > your pseudo-code. I'll try harder. Some things clearly aren't doing > what you _intend_ them to do. For example, in your Python code, each > time through the outer loop you're apparently trying to sort the next > CHUNK elements, but you end up appending CHUNK+1 values to data2 (or > data3). > > Or in this part: > > for i in range(low,high): > x = data[i] > if x >= data[i-1]: > > the first time that loop is executed low == 0, and so i == 0 on the > first iteration, and so the conditional is > > if x >= data[0-1] > > That's referencing data[-1], which is the very last element in data - > which has nothing to do with the CHUNK you're trying to sort at the > time. > > So there are a number of errors here, which makes it that much harder > to sort out (pun intended <wink>) what you're trying to do. It would > help you to add some asserts to verify your code is doing what you > _hope_ it's doing. For example, add > > assert data2[low: high] == sorted(data[low: high]) > assert len(data2) == high > > to the end of your `sample` loop, and similarly for data3 in your > `new` loop. Until those asserts all pass, you're not testing code > that's actually sorting correctly. Repair the errors and you almost > certainly won't find `new` running over 10 times faster than `sample` > anymore. I don't know what you _will_ discover, though. If the code > doesn't have to sort correctly, there are much simpler ways to make it > run _very_ much faster ;-) >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com