Francis Girard <[EMAIL PROTECTED]> writes:
Thank you Nick and Steven for the idea of a more generic imerge.
If you want to get fancy, the merge should use a priority queue (like in the heapsort algorithm) instead of a linear scan through the incoming iters, to find the next item to output. That lowers the running time to O(n log k) instead of O(n*k), where k is the number of iterators and n is the length.
I looked at a heapq solution but didn't see any clean way of dealing with multiple iterators having equal values. The dict solution below deals cleanly with that, since one key can be shared by any number of iterators. Extracting the minimum, and the associated iterables is fast, but the overall solution is still slower than the brute force approach for the 3 hamming iterators.
>>> def imerge(*iterables): ... cache = {} ... iterators = map(iter,iterables) ... number = len(iterables) ... exhausted = 0 ... while 1: # First time through, looked at all of them # Subsequently, update only the minimum iterators ... for it in iterators: ... try: # Key each iterator by its next() value # Multiple iterators may share the same key ... cache.setdefault(it.next(),[]).append(it) ... except StopIteration: ... exhausted += 1 ... if exhausted == number: ... raise StopIteration # Get the lowest value ... val = min(cache) # and all the iterators that have that value ... iterators = cache.pop(val) ... yield val
>>> list(imerge([1, 2, 3, 6], [1, 2, 3, 7], [1, 2, 3, 4, 5])) [1, 2, 3, 4, 5, 6, 7] >>>
Michael
-- http://mail.python.org/mailman/listinfo/python-list