David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-798 (part 5) Correct safe time advancement
......................................................................


Patch Set 32:

(35 comments)

http://gerrit.cloudera.org:8080/#/c/5240/32//COMMIT_MSG
Commit Message:

PS32, Line 31: whole
> hole
doh


PS32, Line 56: it
> is
Done


PS32, Line 52: 
             : - TimeManager now does a pre-flight check before waiting on safe 
time.
             :   In particular it checks that: i) it has heard from the leader 
within
             :   a configurable amount of time (that safe time _can_ make 
progress).
             :   ii) it checks that the safe time it not more that a 
configurable
             :   amount of time in the past, 30 seconds by default (that safe 
time
             :   is likely to make progress to the required timestamp).
> I haven't looked at the code yet, but one question here: these checks shoul
doing this already, yeah :)


PS32, Line 65: unique
> any particular reason why not the one with dup keys too?
because the duplicate keys one only has 20 different rows and the test works 
with row counts.


PS32, Line 71: workloaf
> typo
Done


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/client/client-test-util.cc
File src/kudu/client/client-test-util.cc:

Line 58:   ASSERT_OK(scanner.SetSelection(KuduClient::LEADER_ONLY));
> is LEADER_ONLY still relevant here? or should this be changed to set it to 
left a TODO think we should cleanup this retry code.


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

Line 1147: TEST_F(ClientTest, TestScanFaultTolerance) {
> did you loop this test too?
I had looped it quite a bit cuz it was flaky, but forgot to keep the results. 
Just looped it 1000 more. 1000 passes :) Updated the commit message.


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 456:       KLOG_EVERY_N_SECS(WARNING, 10) << "Safe time advancement 
without writes is disabled. "
> hrm, maybe FIRST_K is better here? not sure seeing it once every 10 seconds
made it minutes Done


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1238:     if (request->has_safe_timestamp()) {
> I'm still not sure this is in the right spot if the tail of the messages in
yeah I agree that we shouldn't take the safe timestamp into account if we 
failed to prepare. left a todo.


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/time_manager.cc
File src/kudu/consensus/time_manager.cc:

PS32, Line 31: safe_time_max_last_advanced_safe_time
> not a big fan of this name. How about: missed_heartbeats_before_rejecting_s
Done


PS32, Line 33: snapshot scans
> scans at snapshots that aren't yet safe
Done


Line 34: TAG_FLAG(safe_time_max_last_advanced_safe_time, advanced);
> let's mark this unstable for now too
Done


PS32, Line 37: lag
> lag behind the requested timestamp?
Done


Line 39: TAG_FLAG(safe_time_max_lag_ms, advanced);
> same
Done


PS32, Line 151:  $0 secs, (max is $1 sec
> 'secs' is redundant here since MonoDelta::ToString already has a 's' suffix
Done


Line 166:                                 safe_time_diff.ToMilliseconds(),
> wrong units here (message says secs)
Done


Line 167:                                 FLAGS_safe_time_max_lag_ms);
> same
Done


Line 173: void TimeManager::MakeWaiterTimeoutMessageUnlocked(Timestamp 
timestamp, string* error_message) {
> probably easier to just return string from this method instead of the out-p
yeah, it's ugly so I tried to hide it. moved back.


Line 188:     if (mode_ == NON_LEADER && timestamp > last_safe_ts_) {
> ah ok, so you are doing the thing that I asked about in the commit message.
Done


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/time_manager.h
File src/kudu/consensus/time_manager.h:

PS32, Line 117: is 
> if
Done


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

Line 2345:       "--maintenance_manager_polling_interval_ms=300",
> typo?
Done


Line 2664:   //workload.set_num_read_threads(2);
> hrm?
Done


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/integration-tests/test_workload.cc
File src/kudu/integration-tests/test_workload.cc:

Line 270:   start_latch_.Reset((uint64_t)num_write_threads_ + 
(uint64_t)num_read_threads_);
> nit: why these casts?
cuz tidy bot complains otherwise. removed


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

Line 100:     // and thus cannot do snapshot scans.
> I think ORDERED is still useful here.
the two were mashed together when we added (or after) fault tolerance. now the  
the tablet service will return an error if ordered is set but not snapshot.
I agree that order is important but not being able to dump a server data if its 
alone is worse imo.
Left a TODO for when we have another read mode.


http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

PS32, Line 84: time
> 'of time (in milliseconds)' and remove 'in milliseconds' from the end of th
Done


PS32, Line 85: it's 
> its
Done


PS32, Line 85: allows to
> 'allows us to' or 'allows waiting'
Done


Line 1329:               ": has no lower or upper bound.");
> lots of reindentation in this file. mind reverting that part of the patch?
must have pressed the indent file key by mistake. Done


PS32, Line 1387: Retunrs
> typo
Done


PS32, Line 1388: CheckIfAncientHistory
> rename to 'VerifyNotAncientHistory' or 'VerifyAfterAHM' or something. Other
Done


Line 1759: MonoTime ClampDeadline(const MonoTime& deadline, bool* was_clamped) {
> rename to ClampScanDeadlineForWait or something so it's clearer where this 
Done


PS32, Line 1816: ahm
> nit: AHM
Done


PS32, Line 1842: allows to
> allows us to
Done


PS32, Line 1850: s.ok() 
> !s.ok() is redundant here with s.IsTimedout()
Done


Line 1861:   if (!s.ok() && s.IsTimedOut() && was_clamped) {
> same
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e
Gerrit-PatchSet: 32
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to