Feature Requests item #1185121, was opened at 2005-04-18 07:11 Message generated for change (Comment added) made by rhettinger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1185121&group_id=5470
Category: Python Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jurjen N.E. Bos (jneb) Assigned to: Raymond Hettinger (rhettinger) Summary: itertools.imerge: merge sequences Initial Comment: (For the itertools library, so Python 2.2 and up) This is a suggested addition to itertools, proposed name imerge. usage: imerge(seq0, seq1, ..., [key=<key function>]) result: imerge assumes the sequences are all in sorted order, and produces a iterator that returns pairs of the form (value, index), where value is a value of one of the sequences, and index is the index number of the given sequence. The output the imerge is in sorted order (taking into account the key function), so that identical values in the sequences will be produced from left to right. The code is surprisingly short, making use of the builtin heap module. (You may disagree with my style of argument handling; feel free to optimize it.) def imerge(*iterlist, **key): """Merge a sequence of sorted iterables. Returns pairs [value, index] where each value comes from iterlist[index], and the pairs are sorted if each of the iterators is sorted. Hint use groupby(imerge(...), operator.itemgetter(0)) to get the items one by one. """ if key.keys() not in ([], ["key"]): raise TypeError, "Excess keyword arguments for imerge" key = key.get("key", lambda x:x) from heapq import heapreplace, heappop #initialize the heap containing (inited, value, index, currentItem, iterator) #this automatically makes sure all iterators are initialized, then run, and finally emptied heap = [(False, None, index, None, iter(iterator)) for index, iterator in enumerate(iterlist)] while heap: inited, item, index, value, iterator = heap[0] if inited: yield value, index try: item = iterator.next() except StopIteration: heappop(heap) else: heapreplace(heap, (True, key(item), index, item, iterator)) If you find this little routine worth its size, please put it into itertools. - Jurjen ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2005-04-18 17:43 Message: Logged In: YES user_id=80475 I had previously looked at an imerge() utility and found that it had only a single application (isomorphic to lazy mergesorting) and that the use cases were dominated by the in-memory alternative: sorted(chain(*iterlist)). Short of writing an external mergesort, what applications did you have in mind? What situations have you encountered where you have multiple sources of sorted data being generated on the fly (as opposed to already being in-memory), have needed one element at a time sequential access to a combined sort of that data, needed that combined sort only once, and could not afford to have the dataset in-memory? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1185121&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com