Re: heapq "key" arguments
On Aug 3, 1:36 pm, Raymond Hettinger wrote: > [Joshua Bronson]: > > > According tohttp://docs.python.org/library/heapq.html, Python 2.5 > > added an optional "key" argument to heapq.nsmallest and > > heapq.nlargest. I could never understand why they didn't also add a > > "key" argument to the other relevant functions (heapify, heappush, > > etc). > > The problem is that heapq acts on regular lists, so it does not > have exclusive access to the structure. So, there is no reliable > way for it to maintain a separate list of keys. Since the keys > can't be saved in the structure (without possibly breaking other > code), the fine grained heapq functions (like heappop and heappush) > would need to call key functions every time they are invoked. > This is at odds with the implicit guarantee of the key function > that it will be called no more than once per key. > > The overall problem is one of granularity. A key function should > be applied once in an initial pass, not on every call to a push/pop > function. The everyday solution that most people use is to operate > on a list of (key, record) tuples and let tuple comparison do the > work for you. Another solution is to build a Heap class that does > have exclusive access to the structure, but the API sugar often > isn't worth the slightly weaker performance. > > One other thought. Heaps are a lazy evaluation structure, so their > fined-grained mutation functions only work well with just a single > ordering function, so there is not need to have (and every reason > to avoid) changing key functions in mid-stream. IOW, the key > function needs to be constant across all accesses. Contrast this > with other uses of key functions where it makes perfect sense > to run minage=min(data, key=attrgetter('age')) and then running > minsal=min(data, key=attrgetter('salary')). The flexibility to > change key functions just doesn't make sense in the context of > the fine-grained heap functions. > > Accordingly, this is why I put key functions in nlargest() and > nsmallest() but not in heappush() and friends. The former can > guarantee no more than one key function call per entry and they > evaluate immediately instead of lazily. > > Raymond I see, that makes sense. Thanks for the great explanation. Josh -- http://mail.python.org/mailman/listinfo/python-list
Re: heapq "key" arguments
[Joshua Bronson]: > According tohttp://docs.python.org/library/heapq.html, Python 2.5 > added an optional "key" argument to heapq.nsmallest and > heapq.nlargest. I could never understand why they didn't also add a > "key" argument to the other relevant functions (heapify, heappush, > etc). The problem is that heapq acts on regular lists, so it does not have exclusive access to the structure. So, there is no reliable way for it to maintain a separate list of keys. Since the keys can't be saved in the structure (without possibly breaking other code), the fine grained heapq functions (like heappop and heappush) would need to call key functions every time they are invoked. This is at odds with the implicit guarantee of the key function that it will be called no more than once per key. The overall problem is one of granularity. A key function should be applied once in an initial pass, not on every call to a push/pop function. The everyday solution that most people use is to operate on a list of (key, record) tuples and let tuple comparison do the work for you. Another solution is to build a Heap class that does have exclusive access to the structure, but the API sugar often isn't worth the slightly weaker performance. One other thought. Heaps are a lazy evaluation structure, so their fined-grained mutation functions only work well with just a single ordering function, so there is not need to have (and every reason to avoid) changing key functions in mid-stream. IOW, the key function needs to be constant across all accesses. Contrast this with other uses of key functions where it makes perfect sense to run minage=min(data, key=attrgetter('age')) and then running minsal=min(data, key=attrgetter('salary')). The flexibility to change key functions just doesn't make sense in the context of the fine-grained heap functions. Accordingly, this is why I put key functions in nlargest() and nsmallest() but not in heappush() and friends. The former can guarantee no more than one key function call per entry and they evaluate immediately instead of lazily. Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: heapq "key" arguments
[Duncan Booth] > The documentation doesn't say anything directly about stability, but the > implementation is actually stable. You can probably assume it must be at > least for nlargest and nsmallest otherwise the stated equivalence wouldn't > hold: > > e.g. nsmallest documentation says: > > Equivalent to: sorted(iterable, key=key)[:n] Yes. The code for nsmallest and nlargest preserves stability so that the equivalence is maintained. Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: heapq "key" arguments
"Gabriel Genellina" wrote: > Ok, it's not strictly the same, but usually it doesn't hurt. The heaqp > module doesn't promise anything about equal elements: it may keep the > original order, rearrange them at will, reverse them, whatever. The documentation doesn't say anything directly about stability, but the implementation is actually stable. You can probably assume it must be at least for nlargest and nsmallest otherwise the stated equivalence wouldn't hold: e.g. nsmallest documentation says: Equivalent to: sorted(iterable, key=key)[:n] -- Duncan Booth http://kupuguy.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list
Re: heapq "key" arguments
En Fri, 31 Jul 2009 16:38:01 -0300, Joshua Bronson escribió: On Jul 31, 2:02 pm, Jonathan Gardner wrote: On Jul 31, 10:44 am, Joshua Bronson wrote: > Say I want to maintain a heap of (x, y) pairs sorted only by > first coordinate. Without being able to pass key=itemgetter(0), won't > heapifying a list of such pairs unnecessarily compare both > coordinates? It will compare the second value only if the first values are equal. I don't see how that helps. That's not at all the same thing as being able to pass key=itemgetter(0). Ok, it's not strictly the same, but usually it doesn't hurt. The heaqp module doesn't promise anything about equal elements: it may keep the original order, rearrange them at will, reverse them, whatever. So the element-wise comparison of tuples is as good as comparing only the first element - *except* when comparing the second element isn't cheap or has side effects or something like that. In that case, use a custom class to redefine comparison operators: from heapq import heapify, heappop class tuplebyfirst(tuple): "tuple that sorts only on first element" def __lt__(self, other): return self[0](I've used an undocumented property: all heapq functions compare elements ONLY by using "<", in 2.6.2 at least. Defining all the other rich comparison methods doesn't change anything) -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: heapq "key" arguments
On Jul 31, 2:02 pm, Jonathan Gardner wrote: > On Jul 31, 10:44 am, Joshua Bronson wrote: > > > Say I want to maintain a heap of (x, y) pairs sorted only by > > first coordinate. Without being able to pass key=itemgetter(0), won't > > heapifying a list of such pairs unnecessarily compare both > > coordinates? > > It will compare the second value only if the first values are equal. I don't see how that helps. That's not at all the same thing as being able to pass key=itemgetter(0). -- http://mail.python.org/mailman/listinfo/python-list
Re: heapq "key" arguments
On Jul 31, 10:44 am, Joshua Bronson wrote: > Say I want to maintain a heap of (x, y) pairs sorted only by > first coordinate. Without being able to pass key=itemgetter(0), won't > heapifying a list of such pairs unnecessarily compare both > coordinates? It will compare the second value only if the first values are equal. -- http://mail.python.org/mailman/listinfo/python-list
heapq "key" arguments
According to http://docs.python.org/library/heapq.html, Python 2.5 added an optional "key" argument to heapq.nsmallest and heapq.nlargest. I could never understand why they didn't also add a "key" argument to the other relevant functions (heapify, heappush, etc). Say I want to maintain a heap of (x, y) pairs sorted only by first coordinate. Without being able to pass key=itemgetter(0), won't heapifying a list of such pairs unnecessarily compare both coordinates? And worse, if the second coordinate is actually an object with no ordering defined for it, heapifying will cause an error even though all I care about is sorting by the first coordinate, which does have an ordering. Am I missing something? -- http://mail.python.org/mailman/listinfo/python-list