Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11395 )
Change subject: KUDU-686 (part 2/2): use DeltaPreparer in DeltaFileIterator ...................................................................... Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/11395/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11395/4//COMMIT_MSG@10 PS4, Line 10: service > what's service? I meant "the servicing of deltas" (i.e. calls to ApplyUpdates, ApplyDeletes, etc.) It's an unorthodox application of the word, though, so I'll rephrase to avoid confusion. http://gerrit.cloudera.org:8080/#/c/11395/4//COMMIT_MSG@27 PS4, Line 27: I : don't have a suitable microbenchmark to prove this, > always skeptical of perf improvements that are not measured (and tracked to I ended up writing a microbenchmark for this. It's in the latest version of the patch. http://gerrit.cloudera.org:8080/#/c/11395/4/src/kudu/tablet/delta_store.h File src/kudu/tablet/delta_store.h: http://gerrit.cloudera.org:8080/#/c/11395/4/src/kudu/tablet/delta_store.h@274 PS4, Line 274: // Call when a new delta becomes available in DeltaIterator::PrepareBatch. > nit: maybe move this part before the paragraph above? But I want it to be the last sentence for Seek, Start, Finish, and AddDelta. http://gerrit.cloudera.org:8080/#/c/11395/4/src/kudu/tablet/delta_store.cc File src/kudu/tablet/delta_store.cc: http://gerrit.cloudera.org:8080/#/c/11395/4/src/kudu/tablet/delta_store.cc@95 PS4, Line 95: MvccSnapshot GetFullyInclusiveSnapshot(); > missing docs Agreed; this was from when I tried to make the DeltaPreparer code free of template-based runtime checks. A few already crept in, and one more isn't going to hurt. http://gerrit.cloudera.org:8080/#/c/11395/4/src/kudu/tablet/delta_store.cc@233 PS4, Line 233: DCHECK_LE(cur_prepared_idx_ - prev_prepared_idx_, dst->nrows()); > why again don't these match anymore? maybe doc it? I'll be honest: I don't fully understand. It has to do with the compaction code; DiskRowSetCompactionInput wraps an iterator tree that, when its NextBlock() is called, resizes the provided RowBlock down to the number of rows actually prepared, and this value is then used as input to the delta iterators' PrepareBatch(). However, the vector passed into CollectMutations() remains sized to 100 regardless. Here's what the crash looks like in compaction-test: F0918 16:56:52.669876 18710 delta_store.cc:299] Check failed: cur_prepared_idx_ - prev_prepared_idx_ == dst->size() (15 vs. 100) *** Check failure stack trace: *** *** Aborted at 1537315012 (unix time) try "date -d @1537315012" if you are using GNU date *** PC: @ 0x7f2b0f5ffe97 gsignal *** SIGABRT (@0x3e800004916) received by PID 18710 (TID 0x7f2b09c48900) from PID 18710; stack trace: *** @ 0x7f2b117a9890 (unknown) @ 0x7f2b0f5ffe97 gsignal @ 0x7f2b0f601801 abort @ 0x7f2b10300d59 google::logging_fail() @ 0x7f2b10302d0d google::LogMessage::Fail() @ 0x7f2b10304ce4 google::LogMessage::SendToLog() @ 0x7f2b1030282d google::LogMessage::Flush() @ 0x7f2b103056b9 google::LogMessageFatal::~LogMessageFatal() @ 0x7f2b12042210 kudu::tablet::DeltaPreparer<>::CollectMutations() @ 0x7f2b12021183 kudu::tablet::DMSIterator::CollectMutations() @ 0x7f2b11f7307a kudu::tablet::(anonymous namespace)::DiskRowSetCompactionInput::PrepareBlock() @ 0x7f2b11f7cb38 kudu::tablet::DebugDumpCompactionInput() @ 0x7f2b11f9f3c0 kudu::tablet::DiskRowSet::DebugDump() @ 0x5634b671a439 kudu::tablet::TestCompaction_TestFlushMRSWithRolling_Test::TestBody() @ 0x7f2b119fcf9d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x7f2b119f2622 testing::Test::Run() @ 0x7f2b119f2704 testing::TestInfo::Run() @ 0x7f2b119f2847 testing::TestCase::Run() @ 0x7f2b119f2d18 testing::internal::UnitTestImpl::RunAllTests() @ 0x7f2b119fd47d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x7f2b119f2e71 testing::UnitTest::Run() @ 0x7f2b12378100 RUN_ALL_TESTS() @ 0x7f2b1237756b main @ 0x7f2b0f5e2b97 __libc_start_main @ 0x5634b67190fa _start http://gerrit.cloudera.org:8080/#/c/11395/3/src/kudu/tablet/deltamemstore.cc File src/kudu/tablet/deltamemstore.cc: http://gerrit.cloudera.org:8080/#/c/11395/3/src/kudu/tablet/deltamemstore.cc@248 PS3, Line 248: // Skip the remaining deltas be > Done. I left a TODO here for myself to also evaluate the optimization; I th I ended up writing a microbenchmark for this (attached). What I found was that we need to skip about 50 updates for the cost of the seek to be less than the cost of the unnecessary delta key decoding. Since most scans are on current data and the number of updates per row is expected to be relatively small, it doesn't seem like we'll be skipping that many updates in the wild. So I reverted the optimization. Raw data available here: https://docs.google.com/spreadsheets/d/1veU7mKPjjRZqo7H7TA4BjxOK4OswOi1XgcWT94KBt8A/edit#gid=0 -- To view, visit http://gerrit.cloudera.org:8080/11395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I87de52092262c4b42c1bd5107f9139edfc3888b5 Gerrit-Change-Number: 11395 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Thu, 20 Sep 2018 02:23:31 +0000 Gerrit-HasComments: Yes