Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12587 )
Change subject: experiments: merge iterator optimization tests ...................................................................... experiments: merge iterator optimization tests Here's a brief exploration into various MergeIterator algorithms, prototyped in Python. Only after I was done did I see that there was an existing experiment on this same subject in C++ (see merge-test.cc). It's not all wasted work though; that experiment didn't include the new "hot/cold" heap algorithms, nor did it account for all MergeIterator quirks such as paged blocks and lower/upper bounds. Below are some timing results on a big el7 machine. The "real" input was a representative (i.e. mostly compacted) 40GB tablet: - NaiveMergeIterator, half-overlapping: 44.7854778767s Counter({'cmp': 25291510, 'peak_blocks_in_mem': 100}) - SingleHeapMergeIterator, half-overlapping: 11.020619154s Counter({'cmp': 10266988, 'peak_blocks_in_mem': 3}) - DoubleHeapMergeIterator, half-overlapping: 3.72211503983s Counter({'cmp': 1178497, 'peak_blocks_in_mem': 3}) - TripleHeapMergeIterator, half-overlapping: 3.52963089943s Counter({'cmp': 1071682, 'peak_blocks_in_mem': 3}) - NaiveMergeIterator, non-overlapping: 44.3896560669s Counter({'cmp': 25958482, 'peak_blocks_in_mem': 100}) - SingleHeapMergeIterator, non-overlapping: 10.9636461735s Counter({'cmp': 10598336, 'peak_blocks_in_mem': 1}) - DoubleHeapMergeIterator, non-overlapping: 2.80402898788s Counter({'cmp': 4021, 'peak_blocks_in_mem': 1}) - TripleHeapMergeIterator, non-overlapping: 2.83524298668s Counter({'cmp': 4021, 'peak_blocks_in_mem': 1}) - NaiveMergeIterator, overlapping: 80.1467709541s Counter({'cmp': 47662665, 'peak_blocks_in_mem': 100}) - SingleHeapMergeIterator, overlapping: 9.61102318764s Counter({'cmp': 8554237, 'peak_blocks_in_mem': 100}) - DoubleHeapMergeIterator, overlapping: 9.68881893158s Counter({'cmp': 8553345, 'peak_blocks_in_mem': 100}) - TripleHeapMergeIterator, overlapping: 9.55243206024s Counter({'cmp': 8563292, 'peak_blocks_in_mem': 100}) - NaiveMergeIterator, real: 1099763.37405s Counter({'cmp': 578660759971, 'peak_blocks_in_mem': 1294}) - SingleHeapMergeIterator, real: 30513.3831122s Counter({'cmp': 30785961774, 'peak_blocks_in_mem': 5}) - DoubleHeapMergeIterator, real: 7987.11197996s Counter({'cmp': 4173739455, 'peak_blocks_in_mem': 15}) - TripleHeapMergeIterator, real: 7155.59520698s Counter({'cmp': 2784969619, 'peak_blocks_in_mem': 5}) Change-Id: I6ae1d2f9e4f41337f475146c648cbab122395f83 Reviewed-on: http://gerrit.cloudera.org:8080/12587 Tested-by: Kudu Jenkins Reviewed-by: Grant Henke <granthe...@apache.org> --- A src/kudu/experiments/merge-test.py 1 file changed, 519 insertions(+), 0 deletions(-) Approvals: Kudu Jenkins: Verified Grant Henke: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/12587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I6ae1d2f9e4f41337f475146c648cbab122395f83 Gerrit-Change-Number: 12587 Gerrit-PatchSet: 7 Gerrit-Owner: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>