topN using a heap

Andrei Alexandrescu via Digitalmars-d Sat, 16 Jan 2016 07:31:43 -0800

https://github.com/D-Programming-Language/phobos/pull/3934

So, say you're looking for the smallest 10 elements out of 100_000. Thequickselect algorithm (which topN currently uses) will successivelypartition the set in (assuming the pivot choice works well) 50_000,25_000, etc chunks all the way down to finding the smallest 10.

That's quite a bit of work, so 3934 uses an alternate strategy forfinding the smallest 10:


1. Organize the first 11 elements into a max heap

2. Scan all other elements progressively. Whenever an element is foundthat is smaller than the largest in the heap, swap it with the largestin the heap then restore the heap property.


3. At the end, swap the largest in the heap with the 10th and you're done!

This is very effective, and is seldom referred in the selectionliterature. In fact, a more inefficient approach (heapify the entirerange) is discussed more often.



Destroy!

Andrei

topN using a heap

Reply via email to