In article <[email protected]>, Duncan Booth <[email protected]> wrote:
> Roy Smith <[email protected]> wrote: > > > > > I'm processing a stream of N numbers and want to keep track of the K > > largest. There's too many numbers in the stream (i.e. N is too large) > > to keep in memory at once. K is small (100 would be typical). > > ... > > Is there a better way to do this, either from a theoretical running > > time point of view, or just a nicer way to code this in Python? > > How about: > > from heapq import nlargest > top = nlargest(K, input()) > > It uses a heap so avoids completely resorting each time. Hmmm, that looks like it would solve the problem as stated, but there's another twist. In addition to finding the K largest values, I *also* need to do some other processing on all the values (which I didn't show in the original example, incorrectly thinking it not germane to my question). The problem with nlargest() is that it doesn't give me a hook to do that. PS: I'm assuming heapq.nlargest(n, iterable) operates with memory proportional to n, and not to the iterator length. That's the only reasonable conclusion, but the docs don't actually come out and say it. -- http://mail.python.org/mailman/listinfo/python-list
