Paul Rubin wrote:
Francis Girard <[EMAIL PROTECTED]> writes:

Thank you Nick and Steven for the idea of a more generic imerge.


If you want to get fancy, the merge should use a priority queue (like
in the heapsort algorithm) instead of a linear scan through the
incoming iters, to find the next item to output.  That lowers the
running time to O(n log k) instead of O(n*k), where k is the number of
iterators and n is the length.

I looked at a heapq solution but didn't see any clean way of dealing with multiple iterators having equal values. The dict solution below deals cleanly with that, since one key can be shared by any number of iterators. Extracting the minimum, and the associated iterables is fast, but the overall solution is still slower than the brute force approach for the 3 hamming iterators.


 >>> def imerge(*iterables):
 ...     cache = {}
 ...     iterators = map(iter,iterables)
 ...     number = len(iterables)
 ...     exhausted = 0
 ...     while 1:
             # First time through, looked at all of them
             # Subsequently, update only the minimum iterators
 ...         for it in iterators:
 ...             try:
                     # Key each iterator by its next() value
                     # Multiple iterators may share the same key
 ...                 cache.setdefault(it.next(),[]).append(it)
 ...             except StopIteration:
 ...                 exhausted += 1
 ...                 if exhausted == number:
 ...                     raise StopIteration
             # Get the lowest value
 ...         val = min(cache)
             # and all the iterators that have that value
 ...         iterators = cache.pop(val) 
 ...         yield val

 >>> list(imerge([1, 2, 3, 6], [1, 2, 3, 7], [1, 2, 3, 4, 5]))
[1, 2, 3, 4, 5, 6, 7]
 >>>

Michael

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to