Dennis Sweeney <sweeney.dennis...@gmail.com> added the comment:
As Serhiy suggested, keeping the algorithm but moving the Python implementation to be a generator again (as I recently changed in PR 18427) gives another performance boost (although this unrolling is many lines of code). Timing the C implementation: .\python.bat -m pyperf timeit -s "from random import random; from collections import deque; from heapq import merge; iters = [sorted(random() for j in range(10_000)) for i in range(20)]" "deque(merge(*iters), maxlen=0)" On Master: Mean +- std dev: 66.9 ms +- 0.6 ms With PR: Mean +- std dev: 17.3 ms +- 0.9 ms Timing the Python Implementation in CPython: .\python.bat -m pyperf timeit -s "from random import random; from collections import deque; from test import support; merge = support.import_fresh_module('heapq', blocked=['_heapq']).merge; iters = [sorted(random() for j in range(10_000)) for i in range(20)]" "deque(merge(*iters), maxlen=0)" On Master: Mean +- std dev: 385 ms +- 3 ms With PR: Mean +- std dev: 250 ms +- 2 ms Timing PyPy: pypy -m pyperf timeit -s "... from heapq import merge; ..." ... versus pypy -m pyperf timeit -s "... from heapq2 import merge; ..." ... Testing on PyPy 7.1.1: Mean +- std dev: 142 ms +- 2 ms This implementation: Mean +- std dev: 38.0 ms +- 1.2 ms ============================================================================= Another positive for this proposal I just discovered is that object.__eq__ no longer needs to be called at each comparison: from heapq import merge from collections import deque class Int(int): lt = eq = 0 def __lt__(self, other): __class__.lt += 1 return int.__lt__(self, other) def __eq__(self, other): __class__.eq += 1 return int.__eq__(self, other) def comparisons(iterables): Int.lt = Int.eq = 0 deque(merge(*iterables), maxlen=0) return Int.lt, Int.eq no_overlap = comparisons( # (0..999), (1_000..1_999), (2_000..2_999), ... map(Int, range(x, x+1_000)) for x in range(0, 16_000, 1_000) ) interleaved = comparisons( # (0,16,32,...), (1,17,33,...), (2,18,34,...), ... map(Int, range(x, 16_000, 16)) for x in range(16) ) print("No overlap: {:,} lt; {:,} eq".format(*no_overlap)) print("Interleaved: {:,} lt; {:,} eq".format(*interleaved)) Before: No overlap: 65,004 lt; 65,004 eq Interleaved: 64,004 lt; 64,004 eq After: No overlap: 32,000 lt; 0 eq Interleaved: 63,968 lt; 0 eq This comes from the way that tuples are compared by scanning item-wise for equality before comparing the first discrepancy. Using the positional information in the tree with the logic yield (right if right < left else left) requires only one rich comparison, while breaking ties with the stored index and the logic yield (b if (b, b_index) < (a, a_index) else a) requires two arbitrary rich comparisons (a != b, then a < b). This can be somewhat fixed with the logic if a_index < b_index: yield (b if b < a else a) else: yield (a if a < b else b) ...but this is another branch in the innermost loop. ============================================================================== I also played with a "Tree of Losers" method[1] here[2], and on the same benchmarks I got: CPython (C): Mean +- std dev: 22.7 ms +- 0.2 ms (slower than in PR 18427) CPython (Pure Python): Mean +- std dev: 197 ms +- 9 ms (faster -- likely because of fewer int operations) PyPy: Mean +- std dev: 38.8 ms +- 0.8 ms (about the same) Maybe some more optimizations could be made to that code. The two approaches should be "isomorphic" in some sense: both should do the same number of comparisons on the same data. But I'll emphasize the differences: Tree of Losers: - Data structure: Loser Tree (does not satisfy heap invariant) - Nodes: (item, index) pairs - Tree has n nodes - To get the next item: do exchanges from leaf to root, swapping as you find better items - Comparisons: branching logic based on index comparison PR 18427 "Tree of winners": - Data structure: binary heap (always satisfies heap invariant) - Nodes: just the items - Tree has 2n-1 nodes (more eagerly evaluated) - To get the next item: replace parent with better child, root to leaf - Comparisons: right < left (simple) Personally, I find the Tree of Winners approach more intuitive, but the Tree of Losers approach seems to be the one that comes up more in the literature. [1] https://en.wikipedia.org/wiki/K-way_merge_algorithm [2] https://github.com/sweeneyde/cpython/tree/replacement-selection ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue38938> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com