Re: topN using a heap

Ivan Kazmenko via Digitalmars-d Sat, 16 Jan 2016 13:30:48 -0800

On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescuwrote:

That's quite a bit of work, so 3934 uses an alternate strategyfor finding the smallest 10:
1. Organize the first 11 elements into a max heap
2. Scan all other elements progressively. Whenever an elementis found that is smaller than the largest in the heap, swap itwith the largest in the heap then restore the heap property.
3. At the end, swap the largest in the heap with the 10th andyou're done!
This is very effective, and is seldom referred in the selectionliterature. In fact, a more inefficient approach (heapify theentire range) is discussed more often.

Yeah, this sounds nice. Suppose we have an array of length n andwant to find the smallest k elements. The first step (heapify)is O(k) operations.


1. Let us analyze the rest for "small enough" k.

1a. If the array is "random", the expected number of insertionsis the sum of probabilities that each new element is inserted.This number is (k/(k+1)) + (k/(k+2)) + ... + k/n, which is ktimes a part of harmonic series, that is, on the order of k logn. In each case, O(log k) operations are required to process thenew element. In total, the "average" case takes (n-k) + (k log nlog k), a total of O(n) for small enough values of k.

1b. In the worst case (each element is less than the currenttop), this still requires (n-k) log k operations, so we can saythe total is O(n log k).

2. For k of the same order as n, the algorithm is O(n log n),just as a regular HeapSort.


To summarize:
For k<<n we have O(n) average case and O(n log k) worst case.
For k~n we have O(n log n) as HeapSort.

Re: topN using a heap

Reply via email to