Hello Tidy Bot, Mike Percy, David Ribeiro Alves, Todd Lipcon, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11395 to look at the new patch set (#8). Change subject: KUDU-686 (part 2/2): use DeltaPreparer in DeltaFileIterator ...................................................................... KUDU-686 (part 2/2): use DeltaPreparer in DeltaFileIterator This patch rewrites much of DeltaFileIterator to leverage DeltaPreparer. Besides sharing code, the motivation is to take advantage of the performance improvement inherent to DeltaPreparer: decoding a batch of deltas just once in PrepareBatch() as opposed to on each call to ApplyUpdates(). Seeing as DeltaPreparer was originally built for DMSIterator, here are the various augmentations that were necessary to support DeltaFileIterator: - REINSERT support, which meant more complicated deletion state tracking. - FilterColumnIdsAndCollectDeltas support, cribbed from DeltaFileIterator. - A templatized traits system to control which features were enabled. This also meant templatizing both DeltaPreparer and DeltaFileIterator. - Early out from the "apply all deltas for a row" loop when the timestamp is no longer relevant. There's an opportunity to seek here and skip any remaining deltas belonging to the row, but testing with a new DMS iterator microbenchmark showed that this is only an improvement when the number of deltas skipped is sufficiently high. The improvement should be most noticeable on tables with wide schemas where multiple columns are projected. In these situations, the column-by-column ApplyUpdates() approach incurred a lot of unnecessary delta decoding. To prove it, I included a new deltafile iterator microbenchmark that scans a subset of a wide schema's columns as a DeltaApplier would. Before: Performance counter stats for 'bin/deltafile-test --gtest_filter=*Benchmark*' (5 runs): 11358.256100 task-clock (msec) # 0.998 CPUs utilized ( +- 3.39% ) 140 context-switches # 0.012 K/sec ( +- 27.37% ) 6 cpu-migrations # 0.001 K/sec ( +- 52.36% ) 34,231 page-faults # 0.003 M/sec ( +- 0.42% ) 42,288,292,153 cycles # 3.723 GHz ( +- 4.12% ) 100,853,942,114 instructions # 2.38 insn per cycle ( +- 5.35% ) 19,689,789,259 branches # 1733.522 M/sec ( +- 5.49% ) 69,419,478 branch-misses # 0.35% of all branches ( +- 5.14% ) 11.378958537 seconds time elapsed ( +- 3.38% ) After: Performance counter stats for 'bin/deltafile-test --gtest_filter=*Benchmark*' (5 runs): 4089.419224 task-clock (msec) # 0.995 CPUs utilized ( +- 1.25% ) 43 context-switches # 0.011 K/sec ( +- 4.10% ) 2 cpu-migrations # 0.000 K/sec ( +- 32.39% ) 34,948 page-faults # 0.009 M/sec ( +- 0.22% ) 15,269,907,971 cycles # 3.734 GHz ( +- 1.30% ) 32,409,048,370 instructions # 2.12 insn per cycle ( +- 1.85% ) 5,848,268,795 branches # 1430.098 M/sec ( +- 1.85% ) 32,900,262 branch-misses # 0.56% of all branches ( +- 2.80% ) 4.111096224 seconds time elapsed ( +- 1.18% ) It's interesting to see the number of page faults increase while everything else has gone down, but that makes sense: the new implementation allocates memory in PrepareBatch() in order to optimize the structure of the deltas. Change-Id: I87de52092262c4b42c1bd5107f9139edfc3888b5 --- M src/kudu/tablet/delta_key.h M src/kudu/tablet/delta_store.cc M src/kudu/tablet/delta_store.h M src/kudu/tablet/deltafile-test.cc M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/deltamemstore-test.cc M src/kudu/tablet/deltamemstore.cc M src/kudu/tablet/deltamemstore.h M src/kudu/tablet/mvcc.cc M src/kudu/tablet/mvcc.h M src/kudu/tablet/tablet-test-util.h 12 files changed, 507 insertions(+), 402 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/11395/8 -- To view, visit http://gerrit.cloudera.org:8080/11395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I87de52092262c4b42c1bd5107f9139edfc3888b5 Gerrit-Change-Number: 11395 Gerrit-PatchSet: 8 Gerrit-Owner: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org>