[nha pham <phq...@gmail.com>] > Statement_1: With an array of size N or less than N, we need at most log2(N) > comparisons to find a value > (or a position, incase the search miss), using the binary search algorithm. > > proof: This statement is trivia, and I believe, someone outthere already > proved it.
Sorry for the quick message here. It's just a simple point where it will pay not to get off on a wrong foot ;-) Correct: for an array of size N, binary search can require as many as ceiling(log2(N+1)) comparisons. That's because there are N+1 possible results for an array of size N. For example, for an array of size 3, [A, B, C], "the answer" may be "before A", "between A and B", "between B and C", or "after C". 3 elements, 3+1 = 4 possible results. log2(3) comparisons are not enough to distinguish among 4 results. Make it trivial, an array of length 1. Then 1 comparison is obviously necessary and sufficient in all cases. And, indeed, ceiling(log2(1+1)) = 1. log2(1) equals 0, too small. For the rest, I haven't been able to understand your description or your pseudo-code. I'll try harder. Some things clearly aren't doing what you _intend_ them to do. For example, in your Python code, each time through the outer loop you're apparently trying to sort the next CHUNK elements, but you end up appending CHUNK+1 values to data2 (or data3). Or in this part: for i in range(low,high): x = data[i] if x >= data[i-1]: the first time that loop is executed low == 0, and so i == 0 on the first iteration, and so the conditional is if x >= data[0-1] That's referencing data[-1], which is the very last element in data - which has nothing to do with the CHUNK you're trying to sort at the time. So there are a number of errors here, which makes it that much harder to sort out (pun intended <wink>) what you're trying to do. It would help you to add some asserts to verify your code is doing what you _hope_ it's doing. For example, add assert data2[low: high] == sorted(data[low: high]) assert len(data2) == high to the end of your `sample` loop, and similarly for data3 in your `new` loop. Until those asserts all pass, you're not testing code that's actually sorting correctly. Repair the errors and you almost certainly won't find `new` running over 10 times faster than `sample` anymore. I don't know what you _will_ discover, though. If the code doesn't have to sort correctly, there are much simpler ways to make it run _very_ much faster ;-) _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com