[kudu-CR] experiments: merge iterator optimization tests

2019-04-10 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12587 )

Change subject: experiments: merge iterator optimization tests
..

experiments: merge iterator optimization tests

Here's a brief exploration into various MergeIterator algorithms, prototyped
in Python. Only after I was done did I see that there was an existing
experiment on this same subject in C++ (see merge-test.cc). It's not all
wasted work though; that experiment didn't include the new "hot/cold" heap
algorithms, nor did it account for all MergeIterator quirks such as paged
blocks and lower/upper bounds.

Below are some timing results on a big el7 machine. The "real" input was a
representative (i.e. mostly compacted) 40GB tablet:
- NaiveMergeIterator, half-overlapping: 44.7854778767s Counter({'cmp': 
25291510, 'peak_blocks_in_mem': 100})
- SingleHeapMergeIterator, half-overlapping: 11.020619154s Counter({'cmp': 
10266988, 'peak_blocks_in_mem': 3})
- DoubleHeapMergeIterator, half-overlapping: 3.72211503983s Counter({'cmp': 
1178497, 'peak_blocks_in_mem': 3})
- TripleHeapMergeIterator, half-overlapping: 3.52963089943s Counter({'cmp': 
1071682, 'peak_blocks_in_mem': 3})
- NaiveMergeIterator, non-overlapping: 44.3896560669s Counter({'cmp': 25958482, 
'peak_blocks_in_mem': 100})
- SingleHeapMergeIterator, non-overlapping: 10.9636461735s Counter({'cmp': 
10598336, 'peak_blocks_in_mem': 1})
- DoubleHeapMergeIterator, non-overlapping: 2.80402898788s Counter({'cmp': 
4021, 'peak_blocks_in_mem': 1})
- TripleHeapMergeIterator, non-overlapping: 2.83524298668s Counter({'cmp': 
4021, 'peak_blocks_in_mem': 1})
- NaiveMergeIterator, overlapping: 80.1467709541s Counter({'cmp': 47662665, 
'peak_blocks_in_mem': 100})
- SingleHeapMergeIterator, overlapping: 9.61102318764s Counter({'cmp': 8554237, 
'peak_blocks_in_mem': 100})
- DoubleHeapMergeIterator, overlapping: 9.68881893158s Counter({'cmp': 8553345, 
'peak_blocks_in_mem': 100})
- TripleHeapMergeIterator, overlapping: 9.55243206024s Counter({'cmp': 8563292, 
'peak_blocks_in_mem': 100})
- NaiveMergeIterator, real: 1099763.37405s Counter({'cmp': 578660759971, 
'peak_blocks_in_mem': 1294})
- SingleHeapMergeIterator, real: 30513.3831122s Counter({'cmp': 30785961774, 
'peak_blocks_in_mem': 5})
- DoubleHeapMergeIterator, real: 7987.11197996s Counter({'cmp': 4173739455, 
'peak_blocks_in_mem': 15})
- TripleHeapMergeIterator, real: 7155.59520698s Counter({'cmp': 2784969619, 
'peak_blocks_in_mem': 5})

Change-Id: I6ae1d2f9e4f41337f475146c648cbab122395f83
Reviewed-on: http://gerrit.cloudera.org:8080/12587
Tested-by: Kudu Jenkins
Reviewed-by: Grant Henke 
---
A src/kudu/experiments/merge-test.py
1 file changed, 519 insertions(+), 0 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Grant Henke: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/12587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6ae1d2f9e4f41337f475146c648cbab122395f83
Gerrit-Change-Number: 12587
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] experiments: merge iterator optimization tests

2019-04-10 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12587 )

Change subject: experiments: merge iterator optimization tests
..


Patch Set 6: Code-Review+2

Reviewing with a +2 without a thorough review given the primary value of this 
is for "archival" reference and a starting point for any future work that is 
similar. I don't think full fledged review is required.


--
To view, visit http://gerrit.cloudera.org:8080/12587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ae1d2f9e4f41337f475146c648cbab122395f83
Gerrit-Change-Number: 12587
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 10 Apr 2019 18:55:32 +
Gerrit-HasComments: No


[kudu-CR] experiments: merge iterator optimization tests

2019-04-09 Thread Adar Dembo (Code Review)
Hello Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12587

to look at the new patch set (#3).

Change subject: experiments: merge iterator optimization tests
..

experiments: merge iterator optimization tests

Here's a brief exploration into various MergeIterator algorithms, prototyped
in Python. Only after I was done did I see that there was an existing
experiment on this same subject in C++ (see merge-test.cc). It's not all
wasted work though; that experiment didn't include the new "hot/cold" heap
algorithms, nor did it account for all MergeIterator quirks such as paged
blocks and lower/upper bounds.

Below are some timing results on a big el7 machine. The "real" input was a
representative (i.e. mostly compacted) 40GB tablet:
- NaiveMergeIterator, half-overlapping: 44.7854778767s Counter({'cmp': 
25291510, 'peak_blocks_in_mem': 100})
- SingleHeapMergeIterator, half-overlapping: 11.020619154s Counter({'cmp': 
10266988, 'peak_blocks_in_mem': 3})
- DoubleHeapMergeIterator, half-overlapping: 3.72211503983s Counter({'cmp': 
1178497, 'peak_blocks_in_mem': 3})
- TripleHeapMergeIterator, half-overlapping: 3.52963089943s Counter({'cmp': 
1071682, 'peak_blocks_in_mem': 3})
- NaiveMergeIterator, non-overlapping: 44.3896560669s Counter({'cmp': 25958482, 
'peak_blocks_in_mem': 100})
- SingleHeapMergeIterator, non-overlapping: 10.9636461735s Counter({'cmp': 
10598336, 'peak_blocks_in_mem': 1})
- DoubleHeapMergeIterator, non-overlapping: 2.80402898788s Counter({'cmp': 
4021, 'peak_blocks_in_mem': 1})
- TripleHeapMergeIterator, non-overlapping: 2.83524298668s Counter({'cmp': 
4021, 'peak_blocks_in_mem': 1})
- NaiveMergeIterator, overlapping: 80.1467709541s Counter({'cmp': 47662665, 
'peak_blocks_in_mem': 100})
- SingleHeapMergeIterator, overlapping: 9.61102318764s Counter({'cmp': 8554237, 
'peak_blocks_in_mem': 100})
- DoubleHeapMergeIterator, overlapping: 9.68881893158s Counter({'cmp': 8553345, 
'peak_blocks_in_mem': 100})
- TripleHeapMergeIterator, overlapping: 9.55243206024s Counter({'cmp': 8563292, 
'peak_blocks_in_mem': 100})
- NaiveMergeIterator, real: 1099763.37405s Counter({'cmp': 578660759971, 
'peak_blocks_in_mem': 1294})
- SingleHeapMergeIterator, real: 30513.3831122s Counter({'cmp': 30785961774, 
'peak_blocks_in_mem': 5})
- DoubleHeapMergeIterator, real: 7987.11197996s Counter({'cmp': 4173739455, 
'peak_blocks_in_mem': 15})
- TripleHeapMergeIterator, real: 7155.59520698s Counter({'cmp': 2784969619, 
'peak_blocks_in_mem': 5})

Change-Id: I6ae1d2f9e4f41337f475146c648cbab122395f83
---
A src/kudu/experiments/merge-test.py
1 file changed, 519 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/12587/3
--
To view, visit http://gerrit.cloudera.org:8080/12587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ae1d2f9e4f41337f475146c648cbab122395f83
Gerrit-Change-Number: 12587
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] experiments: merge iterator optimization tests

2019-02-26 Thread Adar Dembo (Code Review)
Hello Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12587

to look at the new patch set (#2).

Change subject: experiments: merge iterator optimization tests
..

experiments: merge iterator optimization tests

Here's a brief exploration into various MergeIterator algorithms, prototyped
in Python. Only after I was done did I see that there was an existing
experiment on this same subject in C++ (see merge-test.cc). It's not all
wasted work though; that experiment didn't include the new "hot/cold" double
heap algorithm, nor did it account for all MergeIterator quirks such as
paged blocks and lower/upper bounds.

Below are some timing results on my laptop, using the default parameters:
- NaiveMergeIterator, half-overlapping: 0.539506912231s Counter({'cmp': 500021, 
'peak_blocks_in_mem': 100})
- SingleHeapMergeIterator, half-overlapping: 0.138118982315s Counter({'cmp': 
203902, 'peak_blocks_in_mem': 3})
- DoubleHeapMergeIterator, half-overlapping: 0.0430920124054s Counter({'cmp': 
29012, 'peak_blocks_in_mem': 3})
- NaiveMergeIterator, non-overlapping: 0.564009904861s Counter({'cmp': 495000, 
'peak_blocks_in_mem': 100})
- SingleHeapMergeIterator, non-overlapping: 0.152870893478s Counter({'cmp': 
202848, 'peak_blocks_in_mem': 1})
- DoubleHeapMergeIterator, non-overlapping: 0.0303959846497s Counter({'cmp': 
4452, 'peak_blocks_in_mem': 1})
- NaiveMergeIterator, overlapping: 1.08536100388s Counter({'cmp': 980399, 
'peak_blocks_in_mem': 100})
- SingleHeapMergeIterator, overlapping: 0.120500087738s Counter({'cmp': 176355, 
'peak_blocks_in_mem': 100})
- DoubleHeapMergeIterator, overlapping: 0.115973949432s Counter({'cmp': 179273, 
'peak_blocks_in_mem': 100})

The new algorithm shows some slight overhead compared to a heap-based merge
when input is overlapped, but really shines when it's non-overlapped (i.e.
when the tablet is fully compacted).

Change-Id: I6ae1d2f9e4f41337f475146c648cbab122395f83
---
A src/kudu/experiments/merge-test.py
1 file changed, 351 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/12587/2
--
To view, visit http://gerrit.cloudera.org:8080/12587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ae1d2f9e4f41337f475146c648cbab122395f83
Gerrit-Change-Number: 12587
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] experiments: merge iterator optimization tests

2019-02-26 Thread Adar Dembo (Code Review)
Hello Todd Lipcon,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/12587

to review the following change.


Change subject: experiments: merge iterator optimization tests
..

experiments: merge iterator optimization tests

Here's a brief exploration into various MergeIterator algorithms, prototyped
in Python. Only after I was done did I see that there was an existing
experiment on this same subject in C++ (see merge-test.cc). It's not all
wasted work though; that experiment didn't include this algorithm, nor did
it account for all MergeIterator quirks, such as paged blocks and
lower/upper bounds.

Below are some timing results on my laptop, using the default parameters.
I've omitted the "naive" results because they're much much slower and thus
not worth discussing:
- SingleHeapMergeIterator with non-overlapped input: 2.81650519371s
- DoubleHeapMergeIterator with non-overlapped input: 2.9703636s
- SingleHeapMergeIterator with overlapped input: 3.01065206528s
- DoubleHeapMergeIterator with overlapped input: 2.99612498283s

The new "hot/cold" algorithm is comparable to a heap-based merge when input
is overlapped, but really shines when it's non-overlapped (i.e. when the
tablet is fully compacted).

Change-Id: I6ae1d2f9e4f41337f475146c648cbab122395f83
---
A src/kudu/experiments/merge-test.py
1 file changed, 295 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/12587/1
--
To view, visit http://gerrit.cloudera.org:8080/12587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6ae1d2f9e4f41337f475146c648cbab122395f83
Gerrit-Change-Number: 12587
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Todd Lipcon