Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11395 )

Change subject: KUDU-686 (part 2/2): use DeltaPreparer in DeltaFileIterator
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11395/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11395/4//COMMIT_MSG@10
PS4, Line 10: service
> what's service?
I meant "the servicing of deltas" (i.e. calls to ApplyUpdates, ApplyDeletes, 
etc.)

It's an unorthodox application of the word, though, so I'll rephrase to avoid 
confusion.


http://gerrit.cloudera.org:8080/#/c/11395/4//COMMIT_MSG@27
PS4, Line 27: I
            : don't have a suitable microbenchmark to prove this,
> always skeptical of perf improvements that are not measured (and tracked to
I ended up writing a microbenchmark for this. It's in the latest version of the 
patch.


http://gerrit.cloudera.org:8080/#/c/11395/4/src/kudu/tablet/delta_store.h
File src/kudu/tablet/delta_store.h:

http://gerrit.cloudera.org:8080/#/c/11395/4/src/kudu/tablet/delta_store.h@274
PS4, Line 274:   // Call when a new delta becomes available in 
DeltaIterator::PrepareBatch.
> nit: maybe move this part before the paragraph above?
But I want it to be the last sentence for Seek, Start, Finish, and AddDelta.


http://gerrit.cloudera.org:8080/#/c/11395/4/src/kudu/tablet/delta_store.cc
File src/kudu/tablet/delta_store.cc:

http://gerrit.cloudera.org:8080/#/c/11395/4/src/kudu/tablet/delta_store.cc@95
PS4, Line 95: MvccSnapshot GetFullyInclusiveSnapshot();
> missing docs
Agreed; this was from when I tried to make the DeltaPreparer code free of 
template-based runtime checks. A few already crept in, and one more isn't going 
to hurt.


http://gerrit.cloudera.org:8080/#/c/11395/4/src/kudu/tablet/delta_store.cc@233
PS4, Line 233:   DCHECK_LE(cur_prepared_idx_ - prev_prepared_idx_, 
dst->nrows());
> why again don't these match anymore? maybe doc it?
I'll be honest: I don't fully understand. It has to do with the compaction 
code; DiskRowSetCompactionInput wraps an iterator tree that, when its 
NextBlock() is called, resizes the provided RowBlock down to the number of rows 
actually prepared, and this value is then used as input to the delta iterators' 
PrepareBatch(). However, the vector passed into CollectMutations() remains 
sized to 100 regardless.

Here's what the crash looks like in compaction-test:
  F0918 16:56:52.669876 18710 delta_store.cc:299] Check failed: 
cur_prepared_idx_ - prev_prepared_idx_ == dst->size() (15 vs. 100)
  *** Check failure stack trace: ***
  *** Aborted at 1537315012 (unix time) try "date -d @1537315012" if you are 
using GNU date ***
  PC: @     0x7f2b0f5ffe97 gsignal
  *** SIGABRT (@0x3e800004916) received by PID 18710 (TID 0x7f2b09c48900) from 
PID 18710; stack trace: ***
    @     0x7f2b117a9890 (unknown)
    @     0x7f2b0f5ffe97 gsignal
    @     0x7f2b0f601801 abort
    @     0x7f2b10300d59 google::logging_fail()
    @     0x7f2b10302d0d google::LogMessage::Fail()
    @     0x7f2b10304ce4 google::LogMessage::SendToLog()
    @     0x7f2b1030282d google::LogMessage::Flush()
    @     0x7f2b103056b9 google::LogMessageFatal::~LogMessageFatal()
    @     0x7f2b12042210 kudu::tablet::DeltaPreparer<>::CollectMutations()
    @     0x7f2b12021183 kudu::tablet::DMSIterator::CollectMutations()
    @     0x7f2b11f7307a kudu::tablet::(anonymous 
namespace)::DiskRowSetCompactionInput::PrepareBlock()
    @     0x7f2b11f7cb38 kudu::tablet::DebugDumpCompactionInput()
    @     0x7f2b11f9f3c0 kudu::tablet::DiskRowSet::DebugDump()
    @     0x5634b671a439 
kudu::tablet::TestCompaction_TestFlushMRSWithRolling_Test::TestBody()
    @     0x7f2b119fcf9d 
testing::internal::HandleExceptionsInMethodIfSupported<>()
    @     0x7f2b119f2622 testing::Test::Run()
    @     0x7f2b119f2704 testing::TestInfo::Run()
    @     0x7f2b119f2847 testing::TestCase::Run()
    @     0x7f2b119f2d18 testing::internal::UnitTestImpl::RunAllTests()
    @     0x7f2b119fd47d 
testing::internal::HandleExceptionsInMethodIfSupported<>()
    @     0x7f2b119f2e71 testing::UnitTest::Run()
    @     0x7f2b12378100 RUN_ALL_TESTS()
    @     0x7f2b1237756b main
    @     0x7f2b0f5e2b97 __libc_start_main
    @     0x5634b67190fa _start


http://gerrit.cloudera.org:8080/#/c/11395/3/src/kudu/tablet/deltamemstore.cc
File src/kudu/tablet/deltamemstore.cc:

http://gerrit.cloudera.org:8080/#/c/11395/3/src/kudu/tablet/deltamemstore.cc@248
PS3, Line 248: // Skip the remaining deltas be
> Done. I left a TODO here for myself to also evaluate the optimization; I th
I ended up writing a microbenchmark for this (attached). What I found was that 
we need to skip about 50 updates for the cost of the seek to be less than the 
cost of the unnecessary delta key decoding. Since most scans are on current 
data and the number of updates per row is expected to be relatively small, it 
doesn't seem like we'll be skipping that many updates in the wild. So I 
reverted the optimization.

Raw data available here: 
https://docs.google.com/spreadsheets/d/1veU7mKPjjRZqo7H7TA4BjxOK4OswOi1XgcWT94KBt8A/edit#gid=0



--
To view, visit http://gerrit.cloudera.org:8080/11395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I87de52092262c4b42c1bd5107f9139edfc3888b5
Gerrit-Change-Number: 11395
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Thu, 20 Sep 2018 02:23:31 +0000
Gerrit-HasComments: Yes

Reply via email to