Steven D'Aprano wrote: > John Nagle wrote: > > Is "nlargest" smart enough to decide when it's cheaper to track the > > N largest entries on a linear pass through the list than to sort?
It *always* does a linear pass through the list (linear, that is in the length of the entire list). It tracks the n largest elements seen so far in a heap. > Doesn't appear to do so. From Python 3.1: I think you're mis-reading it. > def nlargest(n, iterable): > """Find the n largest elements in a dataset. > > Equivalent to: sorted(iterable, reverse=True)[:n] > """ > it = iter(iterable) > result = list(islice(it, n)) > if not result: > return result > heapify(result) > _heappushpop = heappushpop > for elem in it: > _heappushpop(result, elem) > result.sort(reverse=True) > return result That doesn't sort, or even heapify, the whole list. It keeps the largest n elements seen so far in the 'result' heap, smallest on top. Note the doc for heappushpop: "Push item on the heap, then pop and return the smallest item from the heap." Thus the 'result' heap stays of size n. The final result.sort() is just so the returned list is sorted, and assuming that's a requirement, I think the algorithm is asymptotically the best a comparison-based method can do, regardless of whether the length of the entire sequence dominates n. I figure the worst-case run time is Theta(s lg(n)) where s in the length of the sequence. > Interestingly, nsmallest does use two different algorithms, > depending on how many items you ask for. See the source code. That is interesting. The above algorithm for nlargest is better, but to use it for nsmallest requires a largest-on-top heap, which the module does not bother to implement. -- --Bryan -- http://mail.python.org/mailman/listinfo/python-list