Todd Lipcon has posted comments on this change.

Change subject: KUDU-798 (part 5) Correct safe time advancement
......................................................................


Patch Set 32:

(35 comments)

http://gerrit.cloudera.org:8080/#/c/5240/32//COMMIT_MSG
Commit Message:

PS32, Line 31: whole
hole


PS32, Line 56: it
is


PS32, Line 52: 
             : - TimeManager now does a pre-flight check before waiting on safe 
time.
             :   In particular it checks that: i) it has heard from the leader 
within
             :   a configurable amount of time (that safe time _can_ make 
progress).
             :   ii) it checks that the safe time it not more that a 
configurable
             :   amount of time in the past, 30 seconds by default (that safe 
time
             :   is likely to make progress to the required timestamp).
I haven't looked at the code yet, but one question here: these checks should be 
only relevant for 'now' or 'recent past' type snapshot queries. If you queried 
30 seconds in the past, for example, then the preflight check is not necessary, 
right?

Put another way, we should only do this check if the requested snapshot is not 
safe yet (ie if we'd actually have to wait at all).

Ignore this comment if you're already doing this :)


PS32, Line 65: unique
any particular reason why not the one with dup keys too?


PS32, Line 71: workloaf
typo


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/client/client-test-util.cc
File src/kudu/client/client-test-util.cc:

Line 58:   ASSERT_OK(scanner.SetSelection(KuduClient::LEADER_ONLY));
is LEADER_ONLY still relevant here? or should this be changed to set it to 
SNAPSHOT at current time? (feel free to add TODO if it's not a trivial change)


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

Line 1147: TEST_F(ClientTest, TestScanFaultTolerance) {
did you loop this test too?


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 456:       KLOG_EVERY_N_SECS(WARNING, 10) << "Safe time advancement 
without writes is disabled. "
hrm, maybe FIRST_K is better here? not sure seeing it once every 10 seconds is 
that useful. (or maybe once every 5 minutes or something?)


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1238:     if (request->has_safe_timestamp()) {
I'm still not sure this is in the right spot if the tail of the messages in the 
request failed to prepare. Agree right now it's not relevant, but once the 
leader starts sending this alongside ops, this behavior isn't right


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/time_manager.cc
File src/kudu/consensus/time_manager.cc:

PS32, Line 31: safe_time_max_last_advanced_safe_time
not a big fan of this name. How about: 
missed_heartbeats_before_rejecting_snapshot_scans or something of that nature?


PS32, Line 33: snapshot scans
scans at snapshots that aren't yet safe


Line 34: TAG_FLAG(safe_time_max_last_advanced_safe_time, advanced);
let's mark this unstable for now too


PS32, Line 37: lag
lag behind the requested timestamp?


Line 39: TAG_FLAG(safe_time_max_lag_ms, advanced);
same


PS32, Line 151:  $0 secs, (max is $1 sec
'secs' is redundant here since MonoDelta::ToString already has a 's' suffix


Line 166:                                 safe_time_diff.ToMilliseconds(),
wrong units here (message says secs)


Line 167:                                 FLAGS_safe_time_max_lag_ms);
same


Line 173: void TimeManager::MakeWaiterTimeoutMessageUnlocked(Timestamp 
timestamp, string* error_message) {
probably easier to just return string from this method instead of the 
out-param. Or just inline it at the call site (I only see it once)


Line 188:     if (mode_ == NON_LEADER && timestamp > last_safe_ts_) {
ah ok, so you are doing the thing that I asked about in the commit message. 
nice.


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/time_manager.h
File src/kudu/consensus/time_manager.h:

PS32, Line 117: is 
if


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

Line 2345:       "--maintenance_manager_polling_interval_ms=300",
typo?


Line 2664:   //workload.set_num_read_threads(2);
hrm?


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/integration-tests/test_workload.cc
File src/kudu/integration-tests/test_workload.cc:

Line 270:   start_latch_.Reset((uint64_t)num_write_threads_ + 
(uint64_t)num_read_threads_);
nit: why these casts?


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

Line 100:     // and thus cannot do snapshot scans.
I think ORDERED is still useful here.


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

PS32, Line 84: time
'of time (in milliseconds)' and remove 'in milliseconds' from the end of the 
sentence


PS32, Line 85: allows to
'allows us to' or 'allows waiting'


PS32, Line 85: it's 
its


Line 1329:               ": has no lower or upper bound.");
lots of reindentation in this file. mind reverting that part of the patch?


PS32, Line 1387: Retunrs
typo


PS32, Line 1388: CheckIfAncientHistory
rename to 'VerifyNotAncientHistory' or 'VerifyAfterAHM' or something. Otherwise 
it's not clear from the method name what the return status indicates


Line 1759: MonoTime ClampDeadline(const MonoTime& deadline, bool* was_clamped) {
rename to ClampScanDeadlineForWait or something so it's clearer where this is 
used.


PS32, Line 1816: ahm
nit: AHM


PS32, Line 1842: allows to
allows us to


PS32, Line 1850: s.ok() 
!s.ok() is redundant here with s.IsTimedout()


Line 1861:   if (!s.ok() && s.IsTimedOut() && was_clamped) {
same

Could you combine these two cases by just making the above be:

if (s.ok()) {
  s = mvcc_manager->waitForSnapshotWithAllCommitted...
}

?


-- 
To view, visit http://gerrit.cloudera.org:8080/5240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e
Gerrit-PatchSet: 32
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to