On Aug 3, 1:36 pm, Raymond Hettinger <pyt...@rcn.com> wrote: > [Joshua Bronson]: > > > According tohttp://docs.python.org/library/heapq.html, Python 2.5 > > added an optional "key" argument to heapq.nsmallest and > > heapq.nlargest. I could never understand why they didn't also add a > > "key" argument to the other relevant functions (heapify, heappush, > > etc). > > The problem is that heapq acts on regular lists, so it does not > have exclusive access to the structure. So, there is no reliable > way for it to maintain a separate list of keys. Since the keys > can't be saved in the structure (without possibly breaking other > code), the fine grained heapq functions (like heappop and heappush) > would need to call key functions every time they are invoked. > This is at odds with the implicit guarantee of the key function > that it will be called no more than once per key. > > The overall problem is one of granularity. A key function should > be applied once in an initial pass, not on every call to a push/pop > function. The everyday solution that most people use is to operate > on a list of (key, record) tuples and let tuple comparison do the > work for you. Another solution is to build a Heap class that does > have exclusive access to the structure, but the API sugar often > isn't worth the slightly weaker performance. > > One other thought. Heaps are a lazy evaluation structure, so their > fined-grained mutation functions only work well with just a single > ordering function, so there is not need to have (and every reason > to avoid) changing key functions in mid-stream. IOW, the key > function needs to be constant across all accesses. Contrast this > with other uses of key functions where it makes perfect sense > to run minage=min(data, key=attrgetter('age')) and then running > minsal=min(data, key=attrgetter('salary')). The flexibility to > change key functions just doesn't make sense in the context of > the fine-grained heap functions. > > Accordingly, this is why I put key functions in nlargest() and > nsmallest() but not in heappush() and friends. The former can > guarantee no more than one key function call per entry and they > evaluate immediately instead of lazily. > > Raymond
I see, that makes sense. Thanks for the great explanation. Josh -- http://mail.python.org/mailman/listinfo/python-list