On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescu wrote:
That's quite a bit of work, so 3934 uses an alternate strategy for finding the smallest 10:

1. Organize the first 11 elements into a max heap

2. Scan all other elements progressively. Whenever an element is found that is smaller than the largest in the heap, swap it with the largest in the heap then restore the heap property.

3. At the end, swap the largest in the heap with the 10th and you're done!

This is very effective, and is seldom referred in the selection literature. In fact, a more inefficient approach (heapify the entire range) is discussed more often.

Yeah, this sounds nice. Suppose we have an array of length n and want to find the smallest k elements. The first step (heapify) is O(k) operations.

1. Let us analyze the rest for "small enough" k.

1a. If the array is "random", the expected number of insertions is the sum of probabilities that each new element is inserted. This number is (k/(k+1)) + (k/(k+2)) + ... + k/n, which is k times a part of harmonic series, that is, on the order of k log n. In each case, O(log k) operations are required to process the new element. In total, the "average" case takes (n-k) + (k log n log k), a total of O(n) for small enough values of k.

1b. In the worst case (each element is less than the current top), this still requires (n-k) log k operations, so we can say the total is O(n log k).

2. For k of the same order as n, the algorithm is O(n log n), just as a regular HeapSort.

To summarize:
For k<<n we have O(n) average case and O(n log k) worst case.
For k~n we have O(n log n) as HeapSort.

Reply via email to