Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8090 )
Change subject: KUDU-1125: issue one catalog write per tablet report ...................................................................... Patch Set 6: (10 comments) http://gerrit.cloudera.org:8080/#/c/8090/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8090/6//COMMIT_MSG@10 PS6, Line 10: When the : master is configured to fsync WAL writes, this can add a lot of load during : election storms and when the master is restarted. > any chance you did a before/after test here? eg start a small cluster with Haven't done any perf testing yet. I was planning to do it post-merge, but I'll tackle it now instead. I'll also try to experiment to see whether your estimate for max tablets is correct. It's tough because although it's pretty easy to trigger a full report (restart the master or a tserver), the underlying write performed by that report is likely to be sparse because 3b6ab64 will filter out all tablets that haven't actually changed. Meanwhile, events that do yield real changes (such as an election storm or the creation of a table) tend to be staggered over time. In short, it's hard to create a full report with nothing but real changes. http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3106 PS6, Line 3106: "num_tablets", full_report.updated_tablets_size()); > would be nice to also use TRACE_COUNTER_INCREMENT("reported_tablets", updat Done http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3122 PS6, Line 3122: unordered_map<string, ReportedTabletUpdatesPB*> updates; > worth adding to the comment explaining ownership of these raw pointers Done http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3151 PS6, Line 3151: ReportedTabletUpdatesPB* update = full_report_update->add_tablets(); > to cut down on small allocations here, consider doing full_report_update->m Done http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3157 PS6, Line 3157: shared_lock<LockType> l(lock_); > I think this would likely perform better if you can acquire this once for t I think I can do that without much refactoring, just need a new scope here. http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3227 PS6, Line 3227: bootstrapping > should this now say "copying"? Yeah. The original wording was "This prevents us from spuriously deleting replicas that have just been added to a pending config and are in the process of catching up to the log entry where they were added to the config." Dan asked me whether it should say "just been added to the committed config and are in the process of bootstrapping" and I misinterpreted his suggestion as "change the wording to this", though that wasn't his intent. http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3249 PS6, Line 3249: with an error > nit: rephrase as "non-deleted tablet which is reporting an error" or somesu Done http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3268 PS6, Line 3268: if (!cstate.committed_config().has_opid_index()) { > nit: would it be possible to move this if statement up a few lines before c Yeah, I think that'd be safe. http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3285 PS6, Line 3285: // TODO(adar): ShouldTransitionTabletToRunning doesn't take step 7b into : // account. Should it? > not sure I understand this question -- isn't the condition from ShouldTrans This TODO is trying to draw attention to the fact that ShouldTransitionTabletToRunning when examining the replica's ConsensusStatePB, goes straight to report rather than looking at the local 'cstate' (which may have had its leader_uuid() cleared). I am comfortable changing it to use 'cstate', but I wasn't sure whether that'd be correct. http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3344 PS6, Line 3344: ConsensusStatePB prev_cstate = tablet->metadata().state().pb.consensus_state(); > this is a little confusing - this is shadowing a variable with the same nam Ugh, yes. Given the COW nature of the persistent data, it should be safe to reuse the existing prev_cstate even though we're mutating the cstate. -- To view, visit http://gerrit.cloudera.org:8080/8090 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie6f5cf0e4b1cd1160b3b310d89c6dbf3dd62e43b Gerrit-Change-Number: 8090 Gerrit-PatchSet: 6 Gerrit-Owner: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Tue, 03 Oct 2017 01:08:42 +0000 Gerrit-HasComments: Yes