Feature Requests item #1158313, was opened at 2005-03-07 09:15 Message generated for change (Comment added) made by rhettinger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1158313&group_id=5470
>Category: Python Library >Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Stefan Behnel (scoder) Assigned to: Nobody/Anonymous (nobody) Summary: heapq should provide iterators: itersmallest, iterlargest Initial Comment: The current C-based heapq implementation provides implementations of min-heaps and max-heaps for the "nsmallest" and "nlargest" functions. I would like to see them used to provide iterators over the smallest/largest items of a heap: "itersmallest(heap)" and "iterlargest(heap)". Why: The functions nsmallest() and nlargest() are efficient (and sufficient) if n is known. However, if n is *not* known in advance, it would still be more efficient than sorting to run heapify() over a list and then have an iterator run heappop on each iteration. While this is easily implemented for min-heaps using heappop, max-heaps are not part of the Python-Implementation. Currently, they are only implemented in C. Possible counter-arguments: The straight-forward iterator implementation will manipulate the heap. This could be seen as a side-effect. It should, however, be enough to document that. Similar problems are documented for mutating sequences during iteration. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2005-03-08 11:15 Message: Logged In: YES user_id=80475 I am -1 on this feature request. The design of the module (as functions on lists rather than as a class with encapsulation) does not mesh well these proposals. Building out the API for maxheap functions makes the module harder to learn and it introduces a risk of a user mismatching functions (i.e. heappop with maxheapify). Maxheap functionality can already be obtained by decorating with class which redefines __cmp__. The case for an iterator wrapper is not strong. It was not well received when proposed on python-dev. The risks of mutating the heap during iteration are greater than the equivalent situation for lists because there is no simple way to verify that the heap condition has remained intact between iteration steps. If truly needs, a iterator around the existing minheap is trivial to write, Most user needs are met by sort(), nlargest(), nsmallest(), the existing minheap functions, or the existing minheap functions applied to decorated data. IMO, introducing iterators and maxheaps do not add enough value to warrant complicating the module and introducing new risks of misuse (altering the heapcondition during iteration or mismatching minheap and maxheap functions). The story would be somewhat different if the heaps had originally been encapsulated in a class. If they had, iterators could be added and given protection from mutation. Also, if there were a heap class, the API would easily support key= and reversed= options. But in the absence of a heap class, it would be a mistake to force these buildouts. The OP's corner use case presented on comp.lang.python was met with solutions using the existing toolset. The corner case was wanting a maxheap to read all data into memory, heapify it, and extract n largest elements AND n is not known in advance AND n is small enough that sort(data) was deemed to have too many comparisons. Not knowing n in advance arose because the data contained duplicate records which the OP was unwilling to filter-out in advance with a single pass O(n) step using a dictionary to condense equivalence groups to a single element. ---------------------------------------------------------------------- Comment By: Stefan Behnel (scoder) Date: 2005-03-08 04:20 Message: Logged In: YES user_id=313935 "easy and fast" was the python solution for min-heaps, you mean. Their API is sufficient for a few-line iterator implementation. The possible solutions for max-heaps are comparatively ugly and generally less efficient, the best ways being either a custom per-case solution (decorate-useheap-undecorate) or a generic indirection through an item wrapper class that mirrors the __le__ operator with the additional overhead of python's method call infrastructure. I don't think max-heaps are less common than min-heaps in any way. It just doesn't show that well since custom solutions are easy to write most of the time (each and every time). The major issue with the current heapq implementation is the design choice of not making it a class. I agree that it is a major advantage to have it operate on generic lists, but it come at the price of preventing easy integration of keyword arguments as in sort() and sorted(). A usage like h = heap(myIterator, reverse=true) for item in h: print item would be so handy... For the use-cases: As I said before, nsmallest/nlargest are enough in many cases, but they actually handle a very special case where n is known. On the other hand, iterators have become a major first-level construct for Python and I consider iteration over the ordered elements of a heap a very comon use case. The step towards an augmented API that provides efficient iterators both for min-heaps and max-heaps would both underline Python's paradigm shift and considerably simplify code that currently suffers from custom solutions. And again: most of the code is already there. Another possible solution: what about a module function heapified(seq_or_iter, key=..., cmp=..., reverse=...) in the style of sorted() but returning an iterator? Advantage over sorted() being the higher efficiency if the iterator is thrown away after a small number of calls. Just an idea... ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2005-03-07 11:22 Message: Logged In: YES user_id=80475 The idea for introducing heapiter() function was unsuccessfully proposed on python-dev: http://mail.python.org/pipermail/python-dev/2004-June/045348.html The use cases were rare and the pure python solution was both easy and fast. If you need C coded max-heaps, that could be a separate feature request. There would need to be sufficient use cases to warrant building out the module's API. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1158313&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com